Building An Accurate Model From A Small Number Of Images! A Proposed Classification Model For Tongue Images Using Transfer Learning.
3 main points
✔️ In TCM, tongue images and health status are closely related, and the application of analysis technology in automatic diagnosis is attracting attention, but the tendency that the number of samples of tongue images is small has been pointed out as an issue.
✔️. This paper investigates the performance of tongue image classification using transfer learning, which can achieve high accuracy even with a small number of samples, with the aim of resolving the small number of samples in tongue images.
✔️ From the dataset evaluation results of the transfer learning model - ResNet/Inception_v3 - and 2245 tongue images, the proposed method achieves better classification accuracy than the existing methods, which clearly shows the effectiveness of the transfer learning on tongue images.
Classifying Tongue Images using Deep Transfer Learning
written by Chao Song Jiatuo Xu
(Submitted on June 2020)
Comments: Published by IEEE
Subjects: Transfer learning
background
Is transfer learning an effective solution to the small number of samples in tongue images?
In this paper, we investigate three types of feature classifications on a dataset of 2245 tongue images based on a transfer learning model for image analysis - ResNet and Inception_v3 - in order to clarify the validity of transfer learning for image analysis of tongues. Based on the transfer learning model for image analysis - ResNet, Inception_v3 - we investigate three types of feature classification using a dataset of 2245 tongue images. In TCM, the tongue and human health are closely related, but the subjectivity of the diagnostician is often high, and objectivity is lacking. For this reason, attention has been focused on the development of diagnostic technologies that use deep learning to automatically extract and learn features from tongue images and compensate for subjectivity. On the other hand, it has been pointed out that the number of samples in these images tends to be small. In this study, we investigated the classification performance of tongue images by combining transfer learning, which can achieve high accuracy even with a small number of samples, to address these issues. The proposed method selectively combines models with different learning rates, and the evaluation results show that the proposed method outperforms conventional methods in three feature classifications: the average correct rate is 95.92%. These results suggest that transfer learning has high accuracy in the classification problem of tongue images and can be a solution to eliminate the subjectivity of diagnosticians.
What is Oriental Medicine?
First of all, I will briefly explain Oriental medicine.
Oriental medicine is a traditional medicine with a history of about 2000 years, originating in ancient China. Oriental medicine differs from Western medicine, which has been introduced in many developed countries today: for example, while Western medicine directly approaches the bad part of the body through medication and surgery, Oriental medicine treats the disorder fundamentally from the inside. Oriental medicine is less burdensome and focuses on the root of the disease for a long-term cure. For this reason, it has been reported that oriental medicine is used in some cases to treat and improve intractable diseases that are difficult to treat with western medicine. Oriental medicine also uses herbal medicines, Chinese herbs, acupuncture, and moxibustion, and the often-heard term "acupuncture points" is an original concept of Oriental medicine.
Oriental medicine believes that organs are not independent but interrelated, and diagnoses the body as a whole, not specific organs. The concept based on this idea is "qi, blood, and water". The concept of "qi, blood, and water" is a concept of how the body is perceived in Oriental medicine. "qi" refers to the energy necessary for vital activity, "blood" to blood, and "water" to body fluids other than blood, such as lymph fluid and sweat. Health is defined as a state in which these fluids flow smoothly through the body without excess or deficiency. As mentioned above, these fluids affect each other, so if any one of them is abnormal, the entire balance and physical condition of the body will collapse, leading to unhealthy conditions. Therefore, it is said that maintaining the balance of these elements leads to the maintenance of health. In particular, qi is considered to be the source of life force, and as the saying goes, "illness begins with qi", qi circulation is often the most important factor in treatment. There is a concept called "five organs" that shows this relationship from a more physical aspect, which will be explained in the next chapter.
What are the five organs?
The five viscera have the function of circulating qi, blood, and water, and are composed of five organs: liver, heart, spleen, lung, and kidney - these are different concepts from the so-called organs of Western medicine, although the word "viscera" is used (there is some overlap).
Each organ has its own specific function, and the smooth functioning of these organs helps maintain good health: the liver stores blood and controls the autonomic nervous system; the heart circulates blood and regulates sleep rhythms; the spleen supplies nutrients to the metabolism and muscles; the lungs circulate qi, blood, and water throughout the body and protect against external enemies; and the kidneys are responsible for growth, development, reproduction, aging, and excretion. Excretion. By regulating these organs and maintaining the circulation of qi, it is believed that one can maintain good health and longevity. In addition, there is the concept of the six internal organs, which are the children of the five internal organs, and correspond to the liver, heart, spleen, lungs, kidneys, and heart capsule.
In addition, there is the concept of "Sang-Sheng, Sang-Ke", which shows the relationship between the five viscera and is used as a guideline for the actual treatment. As mentioned above, the five viscera do not work independently but are related to each other, and the relationship between the five viscera is shown by the concept of "Sang-Sheng and Sang-Ke": Sang-Sheng is a relationship that enhances the other viscera; Sang-Ke is a relationship that suppresses them. Thus, patients with cardiovascular diseases are treated by reinforcing the Liver in addition to the Heart itself. In this way, oriental medicine is characterized by a way of thinking that emphasizes the flow of the whole, not just one organ.
What is tongue diagnosis?
Tongue diagnosis is one of the diagnostic methods used in Oriental medicine to diagnose diseases based on the shape and color of the tongue. In Oriental medicine, the state of the tongue reflects the state of health inside the body - internal organs, qi, blood, cold, heat, etc. - and the severity and progress of the disease.
On the other hand, traditional tongue diagnosis is largely based on the subjective observation of the practitioner, and there is a problem of bias due to personal experience and changes in the environment (lighting, etc.). In particular, the diagnosis of the constitution, which is emphasized in Oriental medicine, is difficult to generalize because of the subjectivity of the constitution itself. Therefore, many studies have been reported to propose frameworks for the evaluation and generalization of tongue images. For such generalization, much attention has been paid to the implementation of deep learning on tongue images.
Challenges to the implementation of deep learning in tongue images
In addition to these challenges, a large dataset on tongue images does not currently exist in the public domain, which points to the problem of a small sample size. Deep learning methods usually require a large number of samples to extract valid image features. On the other hand, as the current situation, there are many cases where the number of samples for tongue images is small, and clarifying a method for extracting highly valid features from a small amount of sample data is of great significance in solving these problems.
Purpose of this study
In this paper, we combine several transfer learning methods that can achieve high performance with a small number of samples, derive feature extraction for tongue images, and aim to develop automatic diagnosis and treatment techniques with these images. In order to reduce the significant time and effort required for segmentation in tongue images, we propose a classification method that combines a cascade classifier and deep transfer learning - specifically, a cascade based on Local Binary Pattern (LBP) features based on LBP (Local Binary Pattern) features to automatically determine the tongue position and to automate segmentation. Furthermore, by combining multiple transition learning models with different learning rates, we aim to achieve highly accurate classification and recognition.
technique
Outline of the proposed method
The proposed model (Fig. 2) defines the tongue image segmented by the cascade classifier as the input of the neural network and builds a feature classification model in the final tongue image through feature extraction of the tongue image using different deep learning models to build a prediction model for three different features. Among these steps, a) tongue segmentation, b) transfer learning model, will be discussed below.
Tongue segmentation
This method reduces the non-tongue parts of the tongue image - segmentation - and automatically defines the tongue region for the purpose of improving accuracy.
In addition to the tongue itself, the captured tongue image often contains other unrelated information such as facial expression, background, etc. Since this information can degrade the performance of tongue image analysis, it is necessary to segment the tongue body first. Here, we propose a method for automatic localization and segmentation of tongue regions based on the cascade classification method from the extraction method using LBP features.
where (xc, yc) is the coordinate of the center pixel, p is the p-th pixel in the neighborhood, ic is the gray value of the center pixel, ip is the gray value of the neighborhood pixel, and s(x) is the sign function (see below)
transfer learning model
In this section, we will describe the model used in the proposed method, which is based on transfer learning - a proposed model that combines three different neural networks.
In this study, since the sample size and distribution of the selected tongue image dataset differed greatly from the ImageNet dataset used for training the original network, it was assumed that appropriate features could not be extracted from the tongue images - problems such as overfitting and ineffective convergence of the network model due to loss of gradient. The network model may not converge effectively due to overfitting or loss of gradient.
Therefore, several improvements are introduced to improve the stability and performance of the network: output layer is replaced by global average pooling and Dense with a softmax function to regularize the output network and prevent overfitting; and Optimization using stochastic gradient descent (SGD); different learning rates for different models. Using these settings, we compare the classification accuracies of the three networks to analyze how the network depth affects the classification results and to verify the effectiveness of transfer learning.
Verification Method
To test the performance of the proposed model, the tongue image data was segmented and trained on neural networks of different depths to classify the features of three different tongue images - a toothed tongue, a cracked tongue, and a tongue thickness. We then compare and verify the results.
The experimental data in the validation were collected from several hospitals and classified by TCM experts - with dentition: 516 pieces, without dentition: 566 pieces, cracks: 391 pieces, without cracks: 250 pieces, thick: 392 pieces, thin: 130 pieces. The size of the original image is fixed to 5568 * 3172 pixels.
result
Comparison results with models other than transfer learning
The purpose of this evaluation is to compare and clarify the performance of the model proposed in this study, which uses transfer learning, with that of conventional image analysis models. Three types of features are evaluated: I. Tooth marks; II. Cracks in the tongue; III. Tongue thickness. As a comparison model, we compare the performance with three image analysis models - ResNet18, ResNet50, and Inception_v3.
The evaluation results (Table 1) show that the proposed method can improve the classification accuracy of tongue image features and reduce the training cost of the deep neural network. At the same time, the results show that the training of the network model is accelerated: the average classification accuracy is 95.92%. The results also show that the performance of the proposed method is better than that of the conventional method for the three features of tongue images.
consideration
In order to eliminate the subjectivity of tongue diagnosis in Chinese medicine, automatic diagnosis using tongue images is attracting attention, but the small number of samples in these images has been pointed out as a challenge. In this study, we investigated the classification performance of tongue images by combining transfer learning, which can achieve high accuracy even with a small number of samples. In the proposed method, we selectively combine models with different learning rates to extract image features more accurately. The evaluation results show that in all three types of feature classification, the proposed method has a higher performance than the conventional method in all cases: the average correct rate is 95.92%. From these results, it is expected that the method based on transfer learning has high accuracy in the problem of tongue image classification, and can be an effective solution to eliminating the subjectivity of diagnosticians in TCM tongue diagnosis.
In addition, the evaluation results (Table 1) suggest that the proposed model is robust against the decrease of the estimation accuracy with the increase of the number of layers. In general, some models increase the classification accuracy as the number of layers of the network increases, while others suddenly lose accuracy when the number of layers increases to a certain level - this is confirmed by ResNet in the above table. This may be due to overfitting of the dataset, which results in a decrease in classification performance after a certain number of layers. To address these issues, the proposed method, which allows us to select different learning rates, can effectively avoid overfitting and improve classification accuracy.
On the other hand, the following issues can be considered. First, in this evaluation, the detailed part about the evaluation of tongue image segmentation is not described. Since tongue images are generally captured by devices such as ordinary cameras, they are considered to be subject to measurement errors - errors between devices and capturers. In fact, several studies have been reported that focus on tongue image extraction - segmentation of tongue images - taking these issues into account. While this study describes segmentation with preprocessing for LBP, it is unclear because it does not discuss the impact of these methods on accuracy. A possible solution to this problem is to use multiple imaging devices for evaluation. Secondly, the correct answer rate is used as an evaluation index for classification. It has been mentioned in several studies that most tongue images have an imbalance in constitution and features, and therefore, evaluation by the correct response rate is likely to be strongly affected by the imbalance and not appropriate as an evaluation index. Therefore, it is necessary to present an index such as Area under ROC curve (AUC), which takes into account the influence of positivity.
Categories related to this article