
GestaltMML, A Multimodal Model For The Diagnosis Of Rare Genetic Disorders
3 main points
✔️ Proposes a new multimodal model, GestaltMML
✔️ Integrates frontal facial photographs, clinical features, and demographic information to supplement data for accurate differential diagnosis of rare genetic disorders
✔️ Using multimodal machine learning significantly improves the predictive accuracy of genetic diagnosis
GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts
written by Da Wu, Jingye Yang, Cong Liu, Tzung-Chien Hsieh, Elaine Marchi, Justin Blair, Peter Krawitz, Chunhua Weng, Wendy Chung, Gholson J. Lyon, Ian D. Krantz, Jennifer M. Kalish, Kai Wang
(Submitted on 23 Dec 2023 (v1), last revised 22 Apr 2024 (this version, v2))
Comments: Published on arxiv.
Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Genomics (q-bio.GN)
code:
The images used in this article are from the paper, the introductory slides, or were created based on them.
Summary
It is estimated that approximately 6% of the world's population is affected by some form of rare genetic disease. Less than 200,000 people in the U.S. and less than 1 in 2,000 in Europe are affected, and according to the Orphanet and OMIM databases, there are currently at least 7,000 different rare genetic disorders.
Due to rarity and wide phenotypic diversity, genetic diagnosis can be challenging and often requires a lengthy diagnostic process, also known as the "diagnostic odyssey".Patients suspected ofhaving a genetic syndromeshould undergo genetic testing such as karyotyping, chromosomal microarrays, gene panels, exome sequencing, and genomic sequencing, in addition to numerous clinical evaluations, imaging studies, and laboratory tests. The need to make detailed differential diagnoses for many different conditions poses a significant challenge for clinicians, as it is difficult to determine which diagnostic tests should be used.
Many genetic disorders are characterized by facial features, which may provide a clue to diagnosis and aid in prompt referral to a specialist or selection of appropriate genetic testing. However,recognition of syndromes byfacial featuresis highly dependent on the clinician's experience with facial recognition.There are hundreds of rare genetic disorders that exhibit facial features, and the task of facial recognition is not easy.
More recently, advances in computer vision have led to the development of next-generation phenotyping (NGP) to analyze and predict rare genetic disorders based on 2D frontal facial images of patients. One example is DeepGestalt, developed by FDNA Inc. which was pre-trained with a deep convolutional neural network (DCNN) using the CASIA dataset and subsequently fine-tuned with 17,106 patient frontal face images and 216 disease data However, DeepGestalt has not been able to achieve the same level of accuracy. However, DeepGestalt only addresses a limited number of syndromes, and to cover more syndromes,new images need to be collected and the model re-trained. To address this, GestaltMatcher was introduced. It uses the DeepGestalt feature layer to form a new representation space (Clinical Face Phenotype Space - CFPS) to find the closest matches between patients, including unknown diseases. This allows for the integration of newly identified syndromes without changing the architecture of the model.
However, it is often difficult for facial imaging alone to provide sufficient information to make an accurate diagnosis. For example, syndromes such as Noonan syndrome (NS), Prader-Willi syndrome (PWS), Silver-Russell syndrome (SRS), and Aarskog-Scott syndrome (ASS) share the common feature of short stature, which cannot be captured by frontal facial photographs alone. Sleep disorders, balance disorders, and intellectual disabilities also cannot be effectively captured by pictures of the face or other body features. Additional data is needed to understand these characteristics.
In addition, a number of studies have examined how age, gender, and racial and ethnic differences affect the presentation and frequency of various disorders and syndromes. Certain minority groupsaremisdiagnosedor inaccuratedue to systematic biases rooted in the availability of data and the collection and analysis process. Based on these facts, new models are being developed to integrate facial imaging and clinical HPO terminology.
One example is "Prioritization of Exome Data by Image Analysis" (PEDIA), which combines interpretation of sequence variants with insights from DeepGestalt's advanced phenotyping tools. This approach combines expert evaluation with artificial intelligence analysis by using frontal images to provide a more comprehensive assessment. More recently, an AI-based framework called PhenoScore has been introduced. This framework consists of two modules, facial feature extraction from 2D images and HPO-based phenotypic similarity computation, and uses support vector machines (SVMs) to classify syndromes based on the similarity between the extracted facial features and the HPO. However, existing models process images and text separately and combine the results, which may result in information loss because the interaction between different modalities is not fully captured during training.
To address these challenges, a text-only GPT-based model has recently been developed, called DxGPT, specifically for the diagnosis of rare genetic disorders.Themodelis built on theclosed-sourceGPT-4 and aims to process facial images and clinical text in a consistent manner using multimodal machine learning (MML) methodology. The methodology aims to effectively integrate patient facial images with demographic information, including age, gender, and ethnicity, and textual information, including clinical notes, while preserving the integrity and richness of the data.
Thus,GPT and othertransformer-based multimodal machine learning models are revolutionizing the prediction and diagnosis of rare genetic disorders. Originating in the revolutionary paper "Attention is all you need," transformers have enabled data sequences to be processed in parallel using a self-attention mechanism. This allows models to be efficiently trained and scalable to handle large data sets.
The technology has been widely applied in the fields of natural language processing (NLP) and computer vision (CV), demonstrating its effectiveness in tasks ranging from machine translation, text generation, and sentiment analysis to image classification, object detection, and visual question answering. In addition, recent research has developed several innovative multimodal models that leverage transformers, including ViLT, CLIP, VisualBERT, ALBEF, and Google Gemini.
In this paper, we utilize these state-of-the-art technologies todevelop a new approach, GestaltMML, whichaims to further improve accuracy and efficiency in the diagnosis of rare genetic diseases and improve the treatment process for patients.
Experiment Summary
Figure (A) below illustrates the overall workflow: GestaltMML uses appropriately preprocessed facial images, demographic information, and clinical phenotype descriptions from GMDB (GestaltMatcher database) and OMIM (OMIM: Online Mendelian Inheritance in Man database) with a description of the clinical phenotype of each disease.
Figure (B) belowshows the flow ofGestaltMML's data preprocessing pipeline using Sotos syndrome as an example:the GMDB face images are cropped by "FaceCropper", cropped to 112*112 size, and rotated.Thetraining textisalsodivided into two categories: (1) demographic information + HPO text data, and (2) demographic information + clinical features from the OMIM database summarized by ChatGPT.
Figure (C) belowshows the architecture of GestaltMML, which is based on the ViLT foundation and uses a transformer encoder that can process text and image input. This architecture is similar to ViT, but differs in that ViT only accepts image input.
GestaltMML is a multimodal machine learning model that combines facial photographs, demographic information, and clinical textual data. The database used (GMDB v1.0.9) contains 9,764 frontal face photographs from 7,349 patients affected by 528 rare genetic disorders. The database consists of patients from diverse backgrounds, including Middle Eastern/West Asian, American Native, Southeast Asian, and North African. However, patients of European descent comprise the majority (59.48%), and the nature of rare diseases makes it difficult to prepare a completely balanced data set, which is a challenge. In addition, the proportion of males and females is almost evenly distributed, with 64.90% of patients under the age of 5 years.
In addition, following conventions from previous studies, we evaluate model performance separately for common (>6 patients, GMDB-frequent) and rare (<6 patients, GMDB-rare) diseases within GMDB. It then explores the importance of text and image data features and compares them to current image-based models.
Finally, the proposed method was evaluated on multiple externally validated datasets, including data from the Children's Hospital of Philadelphia (CHOP), the New York State Institute for Basic Research in Developmental Disabilities (NYSIBRDD), and from the published literature, which showed high performance. These results demonstrate the robustness of the proposed method.
Classification of Rare Genetic Disorders in GMDB
To solve the problem of large amounts of missing text data,GestaltMMLexperimented withtraining:test data split ratiosvarying from 1:1 to 9:1 andcalculated the mean and standard deviation of accuracy using three different random seeds. The most effective learning-to-test ratio was3:1, where the model shows the highest accuracy. 72.54% for the top 1, 83.59% for the top 10, 88.96% for the top 50, and 91.64% for the top 100.
However, the challenge is that although the GMDB contains 528 diseases, the number of rare diseases studied is small because there are thousands of rare diseases; databases such as the GMDB only document diseases with characteristic morphological features, so for diseases without distinctive facial features the effectiveness of this model may be limited. Nevertheless, the combination of demographic and clinical phenotypic information is expected to help prioritize diseases in these cases as well.
Functional Importance Analysis and Comparison with Existing Image Models in The GMDB Dataset
While many previous studies have used only facial images in predicting rare genetic disorders, this paper provides a detailed analysis of the importance of features by comparing the latest ensemble image models with the improved GestaltMML. The comparison employs the study-test partitioning method used in previous studies to classify and analyze GMDB into frequent disease groups (GMDB-frequent) and rare disease groups (GMDB-rare).
In particular, GestaltMML is unique in that it uses only the transformer architecture and does not include any convolutional processing. In contrast, other image-only models do not use the transformer architecture. The results of the analysis are shown in the table below, confirming that GestaltMML achieves higher accuracy for image-text combinations.
Specifically, 7755 images were used for training, 792 for testing in GMDB-frequent and 360 in GMDB-rare. The results show that GestaltMML has remarkable prediction accuracy in both the GMDB-frequent and GMDB-rare evaluations.
On the other hand, we used an evaluation method called "modality masking" to test the ability to predict text as well as images. In this process, text portions were replaced with "*" on ViLT and fine-tuned using only face images. This allowed us to compare prediction accuracy from images only or from a combination of images and text.The analysis showed that GestaltViT performed worse when using only images compared to the ensemble image model. However, adding textual information significantly improved prediction accuracy, with GestaltLT outperforming the other models, albeit slightly inferior to GestaltMML.
This experiment highlights how important the combination of image and text data is in the diagnosis of rare genetic diseases and shows how GestaltMML works with a multimodal approach.
Improved Fairness to Minority Group Diagnoses
GestaltMML was trained using GMDB (version 1.0.9), whose database contains data on patients from diverse ethnic backgrounds, including "Middle East/West Asia," "Native American," "Southeast Asia," "North Africa," "Unknown," "African American," "America - Latino/Hispanic," "East Asia," "Other Asia", "South Asia", "Sub-Saharan", and "Africa".
Through the integration of facial images, demographic information, and clinical text, the model significantly improved prediction accuracy, especially for patients from non-Western minority groups. The figure below shows the average accuracy when using different inference modalities, and it is clear that clinical text has the greatest impact on performance improvement. Demographic information has also been shown to be beneficial for minority patients.
In the figure below, we also show how GestaltMML integrates facial images, demographic data, and clinical text to improve accuracy across a small number of ethnic groups compared to when learning was limited primarily to individuals of European descent. However, there are very rare exceptions.
This experiment provides valuable insight into how GestaltMML can improve diagnostic fairness.
Excellent Performance in Clustering Diseases with Clinical Similarities
The GestaltMML model performed a UMAP clustering analysis based on logit values taken from the layer before one of the final layers, demonstrating its ability to effectively classify clinically similar disease groups. This analysis specifically targeted comparisons between Beckwith-Wiedemann syndrome (BWS) and Sotos syndrome, NAA10-related syndrome and NAA15-related syndrome, and KBG syndrome and Cornelia de Lange syndrome (CdLS).
First, analyses were performed on two genetic subtypes of BWS patients and a patient with Sotos syndrome, confirming that the model can clearly distinguish between these overgrowth syndromes.
Next, NAA10- and NAA15-related neurodevelopmental syndromes were evaluated in the GMDB (v1.0.9) dataset, showing that the model can effectively distinguish these two syndromes despite their similar clinical phenotype.
In the final analysis, a group of patients with KBG syndrome and CdLS was used to confirm that the model was able to separate these syndromes, but for patients with CdLS, the use of facial image inference revealed two clusters based on different background colors. This phenomenon is dependent on the background color of the image, suggesting that background color normalization may improve the accuracy of image representation.
These results demonstrate how well GestaltMML performs in identifying disease clusters with clinical similarities and are expected to improve diagnostic accuracy through further improvements.
Summary
GestaltMML, the new multimodal model presented in this paper, integrates frontal facial photographs, clinical features, and demographic information to effectively narrow the differential diagnosis of rare genetic disorders. Such an approach is critical because simply relying on a patient's facial image does not cover all the information needed for an accurate diagnosis of these diseases. Multimodal machine learning can significantly improve the predictive accuracy of genetic diagnosis and is a useful tool for distinguishing clinically similar rare diseases using UMAP clustering analysis.
This clustering approach has the ability to automatically identify newly unrecognized rare diseases without changing the classification layer of the model and, in combination with genomic/exome sequence data, is expected to facilitate interpretation and periodic reinterpretation of the data to address the challenge of the "diagnostic odyssey," a It is expected to address the issue of "diagnostic odysseys".
Compared to traditional CNN-based image models, this approach utilizes both facial images and text as input, thereby achieving remarkable advances in the prediction of rare genetic diseases. In particular, by integrating patient demographic data with textual input, the model identifies unique patterns for each disease, reducing data collection and analysis bias and ensuring unbiased diagnosis. data augmentation techniques using the OMIM database have also been introduced to enhance the model training process. In addition, modality masking techniques are used to validate the importance of textual and visual elements during multimodal learning, providing insight for future research.
These results are of great importance to medical professionals and researchers because they have the potential to revolutionize the diagnosis of rare diseases in the future, and further progress is expected.
Categories related to this article