Joint Weakly- and Semi-Supervised Deep L

Joint Weakly and Semi-Supervised Deep Learning for Localization and Classiﬁcation of Masses in Breast Ultrasound Images

Shin S.Y. Lee S. Yun I.D. Kim S.M. Lee K.M

Background

Breast ultrasound (BUS) imaging aims to detect and classify abnormalities such as masses as either benign or malignant. We present a method for simultaneously localizing and classifying masses in BUS images. We train a convolutional neural network (CNN) for regression of bounding box positions and classiﬁcation of masses, which can then be used to assign a per-image diagnostic label. Training such a model typically requires a strongly supervised image dataset, including the positions and labels of bounding boxes. While larger dataset size helps to avoid overﬁtting and maximize performance, considerable time and cost are required to obtain expert annotations. A dataset with weak annotations, e.g., image-level labels, which is often the case for BUS images, may be insufﬁcient to train a model regardless of its size.

Our Contribution

We present a method to localize and classify masses from BUS images by training a CNN on a relatively small dataset with strong annotations and a large dataset with weak annotations in a hybrid manner.

The main contributions of our work are the development of 1) a one-shot method for the concurrent localization and classiﬁcation of masses present in BUS images and 2) a systematic weakly and semi-supervised training scenario for using a strongly annotated dataset, DXLoc and a weakly annotated dataset, DX, with appropriate training loss selection.

Figure 1. Example of breast ultrasound images with masses. Bound- ing boxes with solid and dashed lines respectively represent ground truths (GT) and detections using the proposed method. Boxes are colored as blue (red) if the GT or predicted label is benign (malignant). Figure best viewed in color.

Figure 2. Illustration of the proposed framework. (a) Images from two different data streams are forward-propagated into a shared network. (b) The Faster R-CNN [34] used for the “network” of (a). The network is composed of the region proposal network (RPN) and Fast R-CNN [35] with shared convolutional layers. This ﬁgure was previously presented in [34] and is reprinted in this paper for the description of the Faster R-CNN. We also note that the proposed method is a general framework; hence, other supervised approaches can also be adopted. (c) An image-level loss is used for images from DX, whereas region-level losses are used for images from DXLoc. Refer to Subsections II-B and II-C for details.

Results

Experimental results show that the proposed method can successfully local- ize and classify masses with less annotation effort. The results trained with only 10 strongly annotated images along with weakly annotated images were comparable to results trained from 800 strongly annotated images, with the 95% conﬁdence interval (CI) of difference −3%–5%, in terms of the correct localization (CorLoc) measure, which is the ratio of images with intersection over union with ground truth higher than 0.5.

Table 4. The initial network is applied to the images in the weakly supervised set and images with the highest classification probabilities coincident with the image-level label are automatically selected and moved to the strongly supervised set, along with the most confident detection as the GT MoI. Retraining is conducted with the reconfigured strongly and weakly supervised sets. Bootstrapping is used to sample subset CorLoc values from which 95% confidence intervals (CI) are computed.

Table 5. Bootstrapping is used to sample subset CorLoc values from which 95% confidence intervals (CI) and p-values for indication statistical significance of improvement are computed. P-values are obtained using a paired t-test for each comparable method.

Figure 6. Qualitative results on the SNUBH dataset. Each row shows different images. Each of the top three rows presents a case with various types of masses, which can be small, large, or unclear. The bottom two rows present failure cases where either localization or classiﬁcation fails. Bounding boxes with solid and dashed lines respectively represent ground truths (GT) and detections using the proposed method. Boxes are colored as blue (red) if the GT or predicted label is benign (malignant). Figure best viewed in color.

Figure 8. Qualitative results on the UDIAT-DXLoc-Ts. Each row shows different images. Bounding boxes with solid and dashed lines respectively represent ground truths (GT) and detections using the proposed method. Each two rows present benign and malignant cases. Boxes are colored as blue (red) if the GT or predicted label is benign (malignant). Figure best viewed in color.

Paper

Triplanar convolution with shared 2D kernels for 3D classification and shape retrieval

Kim, E.Y.; Shin, S.Y; Lee, S.; Lee, K.J.; Lee, K.H.; Lee K.M.

Computer Vision and Image Understanding, Volume 193, April 2020, 102901.

[link] [pdf] [Bibtex]