Location Matters In Medical Images! Contrastive Learning For Medical Image Segmentation

Contrastive Learning 31/05/2022

3 main points
✔️ Contrastive Learning for Medical Image Segmentation
✔️ Determine positive/negative example pairs based on the positional relationship between images
✔️ Semi-supervised learning and transition learning tasks achieve segmentation accuracy that exceeds existing methods.

Positional Contrastive Learning for Volumetric Medical Image Segmentation
written by Dewen Zeng, Yawen Wu, Xinrong Hu, Xiaowei Xu, Haiyun Yuan, Meiping Huang, Jian Zhuang, Jingtong Hu, Yiyu Shi
(Submitted on 16 Jun 2021 (v1), last revised 28 Sep 2021 (this version, v3))
Comments: Published on arxiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

Self-supervised learning can be trained on unlabeled image data and is regarded as a very effective method for medical imaging tasks where it is difficult to obtain annotated data.

Contrastive Learning, a well-known method of self-supervised learning, such as SimCLR and MoCo, has been particularly successful in the field of computer vision (read more about it here).

Contrastive Learning briefly explained, is self-supervised learning that aims to "bring the features of positive example pairs closer together and keep the features of negative example pairs farther apart". The features obtained after learning are used in downstream tasks (image classification, object detection, segmentation, etc.) to improve the accuracy.

In Contrastive Learning, the method of determining positive and negative example pairs is important. In existing methods, a positive example pair is made between a normal image and an image with DA, and a negative example pair is made between different images.

On the other hand, if we try to use the existing methods for medical imaging tasks as they are, problems arise concerning positive and negative example pairs. As a result, the segmentation accuracy in the downstream tasks will be degraded.

Specifically, the problem is that it creates inappropriate negative example pairs. In other words, negative example pairs are created between images that should not be moved away from the feature, and correct learning cannot be performed.

This problem is caused by the presence of the same tissue or organ across multiple images in the dataset. In other words, even though the images are almost identical in appearance, the features are learned to move away from each other as negative example pairs.

In this paper, we propose a new Constructive Learning: Positional Contrastive Learning (PCL ) for medical image segmentation as shown in the figure below. PCL solves the above problem by effectively exploiting the positional relationships in medical images to determine the positive/negative example pairs.

Comparing existing methods with semi-supervised learning and transfer learning using PCL as a pre-training method for medical image segmentation, PCL achieves better segmentation accuracy than existing methods. Furthermore, we found that PCL is more effective when the number of annotated images is small.

This article describes PCL and introduces experimental results showing its usefulness of PCL.

Proposed method: Positional Contrastive Learning

First, we introduce self-supervised learning with PCL.

The above figure shows an overview of PCL. As shown in the figure, a 2D image (x-y plane image) cut along the z direction from a 3D image is used as input data.

This section introduces how to determine the positive and negative example pairs, which is the key to PCL. First, the position is defined to define the position of the 2D medical image. the position is defined as m/n (value between 0~1), where n is the number of 2D images to be cropped from the 3D image and n is the coordinate in the z direction where the 2D images are cropped from (0 < m < n).

Based on this position, positive and negative example pairs are determined as follows.

Positive pairs: 2D images whose difference in position is within a certain threshold.
Negative example pair: Images for which the difference in POSITION exceeds the threshold

The important thing to note here is that pairs are determined solely by the difference in position. 2D images clipped from the same 3D image and 2D images clipped from different 3D images are treated as a positive example pair as long as the difference in position does not exceed a threshold value. In this way, even if the same tissue or organ exists across multiple images, the feature values of these images can be brought closer together.

The loss function in PCL is as follows

It is almost the same as the usual Contrastive Learning, but the loss function in PCL is the average of the Contrastive loss over all positive example pairs for the images in the batch. Equation (1) is the sum of the losses in equation (2) over the entire data set. We also perform data augmentation on each image as in normal Contrastive Learning.

Next, I'll show you how to do fine tuning.

In fine tuning, UNet is used as a model for segmentation, and weights obtained by PCL are used as initial values of weights of the encoder part of UNet. (Fine tuning in the case of self-supervised learning of the comparison method is also performed in the same way as PCL.)

experiment

data-set

CHD
- 68 CT 3D images of the heart, segmentation of 7 tissues (left ventricle, right ventricle, left atrium, right atrium, myocardium, aorta, pulmonary artery)
MMWHS
- 20 CT images & 20 MR images of the heart. 7 tissue segmentation
ACDC
- MR images of the heart from 100 patients, segmented into three tissues (left ventricle, right ventricle, and myocardium)
HVSMR
- 10 MR images of the heart. Blood pooling and myocardial segmentation

comparative approach

As a comparison method with PCL, we use self-supervised learning as follows.

Random (random initialization, no self-supervised learning)
rotation
PIRL
SimCLR
GCL

semi-supervised learning

After performing self-supervised learning, we performed supervised learning (semi-supervised learning) on a small number of annotated data.

CHD and ACDC were used as the data sets, and a 5-fold-validation was performed on each.

From the experimental results of semi-supervised learning, we found the following.

PCL achieves the best segmentation accuracy compared to existing methods in the semi-supervised learning task.
When the annotated data was small (M=2, 6), the difference in accuracy with existing methods was large.

transfer learning

The dataset combinations used for transfer learning and fine-tuning are CHD and MMWHS, and ACDC and HVSMR. the CHD and MMWHS datasets have high domain similarity, while the ACDC and HVSMR data have low domain similarity.

From the experimental results of transfer learning, we found the following.

PCL achieved the best segmentation accuracy compared to existing methods in the transfer learning task.
When the domains of the datasets used during transfer learning and fine tuning were less similar (ACDC and HVSMR), the difference in accuracy between PCL and existing methods was also smaller.

summary

In this article, we introduced Positional Contrastive Learning, which is Contrastive Learning that effectively uses the positional relationships of medical images. It was a very interesting study that utilized the positional relationships unique to medical images.

However, there seems to be room for improvement, as PCL is less effective when domains are less similar.

In addition, self-supervised learning is very powerful as a countermeasure against the lack of data in annotated medical images. Therefore, the research of self-supervised learning methods for medical images such as the one in this paper will be further developed in the future.