Is Transition Learning Really Effective? I've Been Investigating The Effectiveness Of Transition Learning On Medical Imaging Tasks!
3 main points
✔️ Analyze the effect of transition learning on medical imaging tasks from different perspectives
✔️ Transfer learning is not very effective in improving the classification accuracy of medical imaging tasks
✔️ Transition learning in the lowest layer only can significantly improve convergence speed
Transfusion: Understanding Transfer Learning for Medical Imaging
written by Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio
(Submitted on 14 Feb 2019 (v1), last revised 29 Oct 2019 (this version, v3))
Comments: NeurIPS 2019
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
The images used in this article are from the paper, the introductory slides, or were created based on them.
With the development of deep learning, it is also used in medical imaging tasks (e.g., medical image segmentation, lesion detection, etc.) and outperforms conventional methods.
In medical imaging tasks, transition learning with models pre-trained on ImageNet is often used to improve performance.
However, as shown in the figure below, general images (ImageNet images) and medical images, such as those included in ImageNet, have very different characteristics. For example, compared to general images, medical images have higher resolution and must be classified based on local features only.
Because of this difference in characteristics between ImageNet images and medical images, the "Is transition learning with models pre-trained on ImageNet images effective in medical imaging tasks? the problem, but our understanding of the effectiveness of transition learning has remained limited.
Therefore, in this paper, we analyzed the effect of transition learning on medical imaging tasks from various perspectives.
As a result, the effect of transfer learning on the improvement of classification accuracy was transition learning is not very effective in improving classification accuracy in medical imaging tasks We found that "They also claim that "accuracy equivalent to a large-scale model with transition learning can be achieved with a simple, small-scale model".
We further investigated the effect of transition learning on convergence speed and found that "Transition learning only in the lowest layer can significantly improve convergence speed We found that the
In this article, we will present the results of our analysis of the effect of transition learning on medical imaging tasks.
We use the following two large-scale models
The following four small-scale models are used
The architecture of the small-scale model is a layered model of Conv, Batch Normalization, and Relu as shown next.
The following two medical image datasets are used
- Retinal fundus photographs (RETINA data)
- Fundus image dataset
- Two stages of diabetic retinopathy according to the progression of symptoms
- Dataset of chest X-ray images
- Classified into five diseases of the chest
Effectiveness analysis of transfer learning
We analyzed the effect of transition learning on medical image tasks from four perspectives: classification accuracy, feature similarity, feature visualization, and convergence speed.
We verify the effectiveness of transition learning based on classification accuracy. We also use AUC to evaluate the classification accuracy of the model.
The classification accuracy with RETINA data was as follows.
First, if we compare the random initialization (Random Init) and the transfer learning (Transfer) cases, we can see that classification accuracy is hardly improved by transfer learning The following example shows that the
Comparing the large models (ResNet50, Inception-v3) with the small models (CBR-xxx), we can see that their classification accuracies are almost the same.
Furthermore, the small-scale model shows good classification accuracy for retina data, even if the classification accuracy for ImageNet (IMAGENET Top5) is not good.
The classification accuracy in CheXpert is as follows
In CheXpert, as with the RETINA data, we can see that the classification accuracy is hardly improved by the transition learning.
Similarity of features
We quantify the change of feature values and verify the effect of transition learning based on the change.
To quantify the changes in the features, we use the similarity of the features across models, called CCA Similarity.
We examine the change of features before and after training. In the CCA similarity of the features before and after learning, we compare the randomly initialized case (blue, RandInit) and the case with transition learning (yellow, Pretrained) as follows.
Both RandInit and Pretrained have higher CCA similarity in the layer closer to the input layer and lower CCA similarity closer to the output layer. The difference in CCA similarity (gray) between RandInit and Pretrained is higher in the two layers closer to the input layer.
From this result, we can say that the pre-trained weights are effectively reused in the layers close to the input layer. Therefore, it can be said that the transition learning is effective enough only in the layer close to the input layer.
We visualize the feature values before and after learning and verify the effect of transfer learning based on the changes.
The following is a visualization of the CNN filters before and after training.
First, we see that for the large-scale model, the filter changes little for both random initialization (a to b) and transition learning (c to d).
Next, we see that for the small-scale model, both random initialization (e to f) and transition learning (g to h) change the appearance of the filters. However, we also see that in the case of transition learning (g to h), some of the filters that detect lines, edges, etc. learned in the pre-training are lost after training.
We test the effect of transfer learning on the convergence rate.
The relationship between the number of learning cycles (convergence speed) and the layer for transition learning, where AUC>0.91, is as follows. (For example, Block1 means the model that has transition-trained from the input layer of ReNet50 to the first block.)
We can see that the convergence speed is improved by increasing the number of layers for transfer learning.
In addition, the significant improvement in convergence speed even by transition learning of only Conv1 at the lowest layer We can see that the
In this article, we introduced a paper on transition learning in medical imaging tasks. We confirmed that transfer learning is not very effective in terms of improving the classification accuracy of medical image tasks, and the convergence speed can be significantly improved.
In the future, we expect the development of medical image datasets that are comparable in volume to general image datasets and the progress of research on architectures unique to medical image tasks.
Categories related to this article