Tongue Image Diagnosis By Deep Learning: Understanding Systemic Disorders From The Tongue! Part3

Medical Care 27/04/2021

3 main points
✔️ Person-specific constitution is a determinant of disease and treatment policy in Oriental medicine, and the task is to classify it into 9 types based on the concept of Chinese medicine, but diagnosis requires subjectivity and experience, and generalization is a difficult task, so deep learning using tongue images is of interest
✔️ In this paper, we proposed a CP method for automatic identification of constitution from tongue images by detecting and calibrating tongue images, and selecting a classifier according to the complexity of individual tongue images to cope with the effects of environmental conditions and uneven distribution.
✔️. The evaluation results using three different sizes of tongue images taken in a hospital showed that the proposed method outperformed conventional models such as ResNet and VGG-16, suggesting the usefulness of classifying datasets by complexity and using different classifiers.

Complexity perception classification method for tongue constitution recognition
written by Jia jiong Ma, Gui hua Wen, Chang jun Wang, Li jun Jiang
(Submitted on May 2019)
Comments: Published by arXiv
Subjects: CNN (cs: CNN)

Background

What is the appropriate method to build a classifier according to a data set with a mixture of various environmental factors?

This paper focuses on the diagnosis of the constitution by the tongue, which is one of the diagnostic methods of Oriental medicine. Tongue diagnosis has been used since ancient times in China as one of the four diagnoses, namely, watchful waiting (a method of diagnosis based on appearance), and is useful for understanding the disorder of the entire body, including specific diseases, and treating not only the symptoms but also the root of the disease. For this reason, it is said to be particularly effective for intractable and rare diseases that are difficult to treat with Western medicine, and the need for it is increasing as diseases become more diverse. On the other hand, it has been pointed out that such a diagnosis method is difficult for inexperienced doctors and people without knowledge to practice because it depends largely on the experience and subjective judgment of doctors. In this context, deep learning, which has made remarkable progress in recent years, has been attracting attention because it may make it possible for people with little experience or knowledge to achieve highly accurate diagnosis by using a model that learns feature quantities from tongue images.

In this study, we focus on the diagnosis of the constitution in such tongue diagnosis and propose a model to learn it. As we will see later, a constitution is one of the foundations to determine the diagnosis policy in TCM, and accurate diagnosis is possible by performing this diagnosis with high accuracy. In particular, the proposed method focuses on mitigating the variability in imaging and eliminating the imbalance between Case Controls in samples. This article provides an overview of the proposed method.

What is Oriental Medicine?

First of all, I would like to explain briefly about Oriental medicine.

Oriental medicine is a traditional medicine with a history of about 2,000 years and has different characteristics from Western medicine, which has been introduced in many developed countries today. For example, while Western medicine directly approaches the bad part of the body by medication or surgery, Oriental medicine mainly treats the body's disorder fundamentally from the inside. In addition, while Western medicine can treat diseases in a relatively short time, Oriental medicine takes more time, but it is less burdensome to the body. Oriental medicine is characterized by the use of herbal medicines, Chinese herbs, acupuncture, and moxibustion, and the often-heard term "acupuncture points" is also an original concept of Oriental medicine.

Oriental medicine is based on the concept of "qi, blood, and water" to diagnose and treat the whole body, not specific organs because organs are related to each other. Here, "qi" refers to the energy necessary to carry out vital activities, "blood" refers to blood, and "water" refers to body fluids other than blood, such as lymph fluid and sweat. Health is defined as a state in which "qi, blood, and water" flow smoothly through the body without excess or deficiency. As mentioned above, these fluids are thought to affect each other, and if one of them is abnormal, the entire balance and physical condition of the body will be disrupted, so the balance of qi, blood, and water is important. In particular, qi is considered to be the source of life force, and as the saying goes, "disease begins with qi", the management of qi is of utmost importance.

What are the five organs?

The five organs have the function of circulating qi, blood, and water as mentioned above, and are composed of five parts: liver, heart, spleen, lung, and kidney. These are different concepts from the organs of Western medicine, although the word "viscera" is added (there is some overlap).

The outline of the function of each organ is as follows: Liver: "Storage of blood and control of the autonomic nervous system", Heart: "Circulation of blood and regulation of sleep rhythm", Spleen: "Metabolism and supply of nutrition to muscles", Lung: "Circulation of qi, blood, and water to the whole body and protection from external enemies", Kidney: "Growth, development, reproduction, aging and excretion". It is believed that by regulating these organs and maintaining the circulation of qi, one can maintain a healthy state for a long time and live a long life.

In addition, the six internal organs are related to the five internal organs in a master-servant relationship. These are like the children of the five viscera, and consist of the gall, small intestine, stomach, large intestine, bladder, and sanjiao, which correspond to the liver, heart, spleen, lungs, kidneys, and heart envelope (the membrane that surrounds the heart), respectively.

Constitution in Oriental Medicine

The constitution in TCM is considered to be one of the inborn and acquired characteristics in life activity, and is one of the most important factors which are highly related to characteristic diseases, determine the tendency of diseases, and are used to determine treatment. These constitutions are classified into nine types: qi deficiency, yin deficiency, yang deficiency, phlegm-dampness, damp-heat, qi depression, blood stasis, special constitution, and mildness. For example, qi deficiency is the aforementioned deficiency of qi and is considered to be a constitution that is prone to general malaise and dizziness and also prone to illness. In order to know these constitutions, questionnaires are used, but they are easily influenced by the subjective intentions of individuals and require an enormous amount of time for investigation. Therefore, attention has been paid to a method that complements these methods by using images of the tongue to determine the constitution from tongue examination.

What is tongue diagnosis?

Tongue diagnosis (diagnosis using the tongue) is one of the diagnostic methods of Oriental medicine and is used to diagnose diseases based on the shape and color of the tongue. In Oriental medicine, the tongue reflects the state of health inside the body (internal organs, qi, blood, cold, heat, etc.) as well as the severity and progression of the disease.

On the other hand, traditional tongue diagnosis is largely based on the subjective observation of the practitioner and has the problem of bias due to individual experience and environmental changes (such as lighting). In particular, in the aforementioned diagnosis of constitution, many methods have been reported to propose frameworks for constitutional evaluation and generalization of tongue images, because the constitution itself is subjective. In order to alleviate the barriers to diagnosis caused by these subjective aspects, the introduction of deep learning has led to the development of objective and quantitative methods for tongue diagnosis.

Previous studies on constitutional diagnosis by tongue imaging

However, most of the studies on constitutional recognition using tongue images have mainly analyzed statistically significant correlations between tongue images and constitution types, and there are few studies on automatic constitutional discrimination using machine learning. In these studies, each tongue image can be regarded as a problem of tongue image classification that classifies each tongue image into nine constitutional types, so that the introduction of deep learning can be considered, but as of now, there are few studies that investigate the characteristics of automatic feature extraction. In addition, while the introduction of deep convolutional neural networks for tongue images has been considered, a framework that combines tongue detection, tongue calibration, and constitutional recognition has not been presented. The authors have presented a system framework that automatically realizes constitutional recognition, consisting of tongue image acquisition, tongue coating detection, tongue coating calibration, tongue feature extraction, and constitutional classification applying deep learning methods.

Purpose of this study

In this paper, we propose a complexity perception (CP) classification method that uses different classifiers based on the difficulty of image classification to achieve constitutional recognition with tongue images in the diagnostic framework that has been proposed by the authors. The proposed model takes into account individual-level complexity in tongue images to mitigate the effects of uneven image distribution caused by various environmental conditions such as illumination and resolution.

Outline of the proposed method

In order to accurately extract the features of tongue images, the authors propose a system framework (Fig. 1) that combines deep learning methods with a large learning database to identify the constitution, which consists of six steps. The first step is the acquisition of tongue images, which are assumed to be captured using a camera in a natural environment. The second step is the pre-processing which includes tongue detection, the detection is done using Faster R-CNN, which is used in the field of object detection, and the region of tongue detected by VGG is calibrated to acquire an accurate image. After that, the tongue coating image is segmented from the whole tongue image. In the third step, the features of the segmented tongue image are extracted using deep learning technique and finally constitutional diagnosis is done based on CP method.

Complexity perception (CP) taxonomy

The complexity perception (CP) method is a method that uses different classifiers according to the difficulty of classification to increase the estimation accuracy. Usually, the learning model has high accuracy for simple samples that are easy to classify, but the accuracy tends to decrease for complex samples that are difficult to classify (i.e., near the boundary, noisy samples). In particular, the ease of classifying samples is expected to vary greatly under different conditions of captured images, depending on the light conditions, etc. The CP method quantitatively evaluates and divides the difficulty of classifying samples, and trains samples that are easy and difficult to classify separately, thereby The aim of the CP method is to estimate tongue images taken under various conditions with high accuracy.

As a specific complexity model in the proposed method, we label samples by a measure of sample simplicity (see below) and construct a learning model to discriminate simple and complex samples by K-nearest neighbor method and logistic regression model. In training, we adjust parameters from samples and labels.

While building a learning model for simplicity classification in this way, we classify each sample as easily identifiable or difficult to identify and train for each classified category (Figure 2). The first step is to detect and calibrate the tongue coatings of the samples. The second step is feature extraction using deep learning methods such as Resnet-50, Inception-V3, and VGG-16, as well as LBP and color Moment. The third step is to divide the training samples into easy and hard to learn according to the calculation and complexity of each sample. The fourth step is to construct new training data from these simplicity labels of each sample, and train the classifier on the easy and difficult data respectively to build a model with high performance. In the test phase (right side of Fig. 2), we extract the features of the test sample in the same way as in the training phase, and then use the learned simplicity discriminant model on the new test sample to determine whether the test sample is easy to classify. If the test sample is easy to classify, the simplex model is used; if it is difficult, the complex model is used.

Verification Method

To evaluate the performance of the proposed method, we constructed a dataset on tongue images and also compared several related state-of-the-art approaches. The dataset of tongue images contains 22,482 constitutional images, showing that each constitution has its own disproportionality (Figure 3).

In addition, two types of models are used for feature extraction for the proposed method: image features and color and texture VGG-16 model, Inception-V3 model, and ResNet-50. These networks are pre-trained in ImageNet and run 100 epochs with batch size 64 using Adam and initial learning rate is set to 0.0001. We apply LBP for texture feature extraction and Color-Moment for color feature extraction. First, the tongue image is segmented, and after introducing LBP to each region with eight sampling points on a circle of one radius, the features are stitched together using PCA and reduced to 50 dimensions; Color-Moment performs feature extraction corresponding to the color distribution of the tongue image based on the mean, variance, and skewness of the image. Finally, the features are aggregated as a combination of texture and color features.

In this paper, we split the training set and test set by 5-fold cross-validation, and 15% of the training set is used as the validation set. We also use three basic classifiers, Softmax, SVM, and DecisionTree, to evaluate the effectiveness of CP. The optimal hyperparameter θ is determined by the validation set, and the parameters N and k are set to 150 and 50 based on our experimental experience. Finally, the average classification accuracy of five experiments is used to evaluate the performance of the method.

Result

The result of varying the training data

We evaluate the performance of the CP on three datasets in order to investigate how the performance varies across different training datasets.

The table below shows the comparison results on the datasets Tongue-100, Tongue-80, and Tongue-60, where Tongue-60 means that 60% of all training samples were applied to the experiment; ξA is the basic classifier ξ trained on all training samples, ξE is the basic classifier trained on the easy trained on training samples, ξD represents the case trained on difficult training samples, and we choose among Softmax, SVM, and DecisionTree as classifiers.

The experimental results on Tongue-100 show that the average accuracy is 2.14% higher than the whole for the easy to identify samples, and 12.47% higher on average for the hard-to-identify samples. From these results, it can be seen that the performance of the proposed method is higher when the classifier is trained on the segmented samples instead of the whole training samples. Furthermore, we can see that the proposed method CP outperforms the comparative method on all test samples when three classifiers are considered, and the table shows that CP performs best on the basic classifier DecisionTree. Similarly, the partitioned datasets from the results of Tongue-80 and Tongue-60 have higher accuracy and the proposed method obtains better improvement in classification performance than Softmax and SVM using DecisionTree.

Comparison by category

The purpose of this evaluation is to assess the performance of the proposed method in the constitutional category. We use both Resnet50 + Decision Tree and VGG16 + SVM as baselines and measure the performance of CP on Tongue-100.

From the confusion matrix of classification for the test samples (Figure (a) and (b) below), when Resnet50 + Decision Tree and CP are applied, the recognition accuracy of CP for Qi deficiency, Yin deficiency, and Yang deficiency is improved by 2.42%, 3.27% and 2.11%, and the recognition accuracy of CP for Phlegm Dampness, Damp Heat, Qi Depression and Softness is also improved similarly. Similarly, from the confusion matrices of VGG16 + SVM and CP (Figure (c) and (d) below), it is confirmed that the classification performance of most of the categories is improved without reducing the accuracy of other categories.

Evaluation of data reinforcement for imbalance

This evaluation is done to investigate the performance of the proposed method on the imbalance data under evaluation.

It is commonly known that due to category imbalance in a dataset, the classifier tends to classify samples into larger categories, resulting in lower classification accuracy. Therefore, to show that the performance of our method can be improved by data augmentation, we added an experiment of data augmentation.

We selected the VGG-16 model with the highest recognition accuracy and selected Horizontal Flip, Random Crop, Random Shift, and Random rotation for data augmentation (see the table below). It is confirmed that the proposed method is the best.

Consideration

While the characteristics of tongue tooth shape is an important indicator in TCM diagnosis, it has been difficult to generalize due to the experience and subjective judgment of diagnostic physicians. Therefore, it has been proposed to generalize them by using deep learning techniques with tongue images. In this study, in particular, we proposed a CP method that selects a classification model according to the difficulty of classifying samples due to environmental changes in tongue images.

The evaluation results show that on average, the accuracy is 2.73% higher on the easy to classify dataset and the system has 6.52% higher on the complex sample. These results suggest that separating the dataset into two subsets according to complexity can effectively improve the classification accuracy. From the comparison results on the difficult test samples, the difference between the complex dataset and the overall accuracy in Tongue-100, Tongue-80 and Tongue-60 is 12.47%, 5.26%, and 1.84% respectively. This shows that the size of the dataset affects the ability of the classifier to distinguish between easy and difficult to identify samples. The highest performance of the proposed method was obtained when the decision tree classifier was set as the basic classifier.

This study improves and suggests that the model is valid even when the images are from different sources. Although the tongue images of Chinese patients were used in this evaluation, the proposed method is expected to be independent of such ethnicity and can be easily extended to other data.

On the other hand, we consider the following challenges. We believe that multi-label learning is necessary to recognize the constitution of the tongue. Since the human constitution is complex, and it is assumed that there are cases in which several of the nine constitutions mentioned above are included at the same time, multi-label learning would be effective for such cases. Finally, a new way of measuring the complexity of a sample could be designed.

Categories related to this article

今給黎薫弘