Catch up on the latest AI articles

A Proposed Framework To Support The Rehabilitation Of Dysphagia

A Proposed Framework To Support The Rehabilitation Of Dysphagia


3 main points
✔️ While the rehabilitation of dysphagia requires accurate measurement of the reaction time of the pharyngeal swallowing reflex in the videofluoroscopic swallowing study (VFSS), the millisecond accuracy of the measurement makes it difficult to measure in some cases, depending on the experience of the physician.
✔️ To accurately measure the reaction time of the swallowing reflex regardless of experience, we propose a new framework that can automatically detect events of short duration.
✔️ The average class detection rate during the swallow reflex was 97.5% (validation). These results suggest that it is possible to automatically measure the reaction time of the pharyngeal swallowing reflex and perform VFSS independently of experience.

Machine learning analysis to automatically measure response time of pharyngeal swallowing reflex in videofluoroscopic swallowing study
written by 
Jong Taek LeeEunhee ParkJong-Moon HwangTae-Du Jung & Donghwi Park
(Submitted on 7 Sep 2020)
Comments: Scientific Reports volume 10, Article number: 14735 (2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The images used in this article are from the paper, the introductory slides, or were created based on them.


Is it possible to use deep learning to create a system that covers the experience of the physician and automates the process?

In this study, we aim to develop an automated method for measuring the reaction time of the swallowing reflex, which is required in a type of rehabilitation for dysphagia - the videofluoroscopic swallowing study (VFSS ). Our goal is to develop an automated method to measure the reaction time of the swallowing reflex. Dysphagia is a serious functional disorder that can lead to malnutrition and pneumonia in the elderly, and after the onset of dysphagia, rehabilitation is performed to promote appropriate swallowing. The first step in this rehabilitation process is to evaluate the swallowing function using a technique called VFSS, which requires measurement of the reaction time of the swallowing reflex, but because of the millisecond precision of the measurement, the measurement time varies depending on the experience of the physician. In this study, we addressed these issues. In this study, we aim to automate the measurement of reaction time in the swallowing reflex by utilizing image analysis technology based on pharyngeal images.

What is dysphagia?

First of all, I will briefly explain the subject of this study, dysphagia.

In a nutshell, dysphagia (difficulty eating and swallowing) is the inability to eat or swallow properly. Symptoms include swallowing when eating, the inability to chew and swallow shaped food, and taking a long time to eat, which can result in low nutrition, dehydration, aspiration, and choking on swallowed food.

These can occur in a wide range of ages, from newborns as well to the elderly. For example, feeding and swallowing disorders in children may be caused by congenital diseases (cerebral palsy, Down's syndrome, etc.) or brain trauma from traffic accidents. In addition, since the act of eating is not innate but acquired through experience, there have been reports of cases of dysphagia occurring when the process or environment of learning how to eat is inappropriate. In adults, cerebrovascular diseases such as cerebral infarction, and neuromuscular diseases can cause feeding and swallowing disorders, which can lead to symptoms such as inability to move the tongue or chew due to inability to transmit commands from the brain.

There are two particular problems with dysphagia: nutritional decline and pneumonia caused by aspiration (swallowing pneumonia or aspiration pneumonia ). In the former case, impaired food intake can lead to inadequate nutrition, low nutritional status, and decreased physical function. The latter is particularly common in the elderly, whose swallowing function tends to deteriorate, and the majority of pneumonia in the elderly is said to be caused by aspiration due to age-related deterioration of the swallowing function. In Japan and other developed countries, the aging of the population is accelerating, and the number of patients with pneumonia due to swallowing disorders is also on the rise, so the need for action is rapidly increasing.

What is the Videofluoroscopic swallowing study (VFSS)?

The VFSS is a test used to evaluate the clinical features of dysphagia and to determine the course of rehabilitation. Swallowing consists of three main processes: the oral phase (chewing), when the tongue moves food from the mouth to the pharynx; the pharyngeal phase (swallowing), when the gag reflex moves food from the pharynx to the esophagus (swallowing); and the esophageal period, when food is carried to the stomach by peristalsis of the esophagus. The VFSS focuses on the pharyngeal ( swallowing ) and esophageal phases of the swallowing process and considers aspiration to be a consequence of the disorder. The test is evaluated on two axes - Reliability (the same result is obtained over and over again ) and Validity (the likelihood of obtaining the correct answer) - using an eight-point scale. and Validity.

For evaluation using the VFSS, it is necessary to accurately measure the reaction time of the pharyngeal swallowing reflex. The gag reflex is a swallowing reflex that occurs very rapidly (less than 0.5 seconds) after the formation of a food mass by chewing. Therefore, measurement requires a wealth of clinical experience, and inexperienced and experienced clinicians often have different measurement times for the pharyngeal swallowing reflex, pointing to a lack of mutual reliability as an issue.

purpose of one's research

To remedy this lack of reliability, this study proposes an automatic measurement time derivation system that utilizes image analysis on pharyngeal images. The features of this study are threefold: reliable measurement of response time; proposal of a non-empirical evaluation method; and provision of information for determining rehabilitation strategies. First, it enables the derivation of reliable response time estimates of the pharyngeal swallowing reflex from actual VFSS images - use of the data that is closer to the field and is likely to be of great clinical significance. Second, it is anticipated that all clinicians will find it useful in determining normal, delayed, or absent swallowing reflexes from VFSS images. Third, it can provide clinical information to determine rehabilitation strategies - such as thermo tactile stimulation to induce swallowing more rapidly in patients with no swallowing reflex or difficulty swallowing. In the evaluation of the proposed method, we report that the swallowing reflex time can be detected with high accuracy and that the exact response time can be measured automatically. We believe that our method can improve inter-rater reliability in dysphagia rehabilitation by providing accurate measurement times regardless of the maturity of the physician's experience.


data set

The data set utilizes VFSS data from 27 individuals who complained of subjective dysphagia. The participants ranged in age from 22 to 84 years (mean age 64.9 ± 15.7 years), with 21 males and 6 females aged 65 years or older and healthy ( N = 3, 11. 1%), while the remaining participants were diagnosed with central nervous system disease ( N = 16, 59.2% ) or neuromuscular disease ( N = 8, 29. 6%). In VFSS, the patient sits upright in front of a fluoroscope set at 30 frames per second and ingests eight substances mixed with diluted radiopaque barium ( 35%w/v ) -3, 6, and 9 mL curd yogurt (thick liquid), 3, 6, and 9 mL water (thin liquid), semisweet rice (semisolid), and steamed rice (solid). Of the 27 participants, 7 ingested 8 different substances and completed 8 pharyngeal swallowing events during VFSS. 9 participants had multiple pharyngeal swallows and completed 8 or more pharyngeal swallowing events when swallowing a single substance. 11 participants had no aspiration during VFSS. 11 participants had less than 8 pharyngeal swallow events due to severe aspiration during the VFSS and were unable to swallow all of the material (see figure below).

Video clips of pharyngeal swallowing events were acquired at 15 frames per second ( FPS ), and two experts evaluated the beginning and endpoints of the pharyngeal swallowing reflex in the video and assigned correct labels.


As a result of prior work, and to accelerate the learning process, a pre-trained Inception-V1 architecture - four maximum pooling layers, an average pooling layer, two convolutional layers, and nine inception modules: computational cost and overfitting are improvements - is used as a base (see figure below).


Evaluation Conditions

A total of 207 pharyngeal swallowing event clips extracted from the raw VFSS video were annotated with the start and endpoints of the pharyngeal swallowing reflex by expert clinicians and used as training data.

Learning and Testing

To demonstrate the generalization capability, we perform a 5 fold-cross validation, utilizing the Titan XGPU, with approximately 80% of the total swallowing reflex videos used as the training set and the remainder as the test set. For the test data, we selected 5-6 patients out of 27 participants who were not included in the training data and separated all their swallowing reflex videos from the test data - 5 groups of test data were created and the number of swallowing reflex events in the test data was reported to be 40-42 It has been reported that.

valuation index

The three evaluation metrics are F-1 score - harmonic mean of precision and recall -; Start and end time error in the swallowing reflex - the difference in the frame index of the start and endpoints of the swallowing reflex between the truly correct answer and the label -; IOU ( Intersection over Union) - the ratio of the frame length of the intersection to the frame length of the union at the predicted and correct time of the swallowing reflex.

evaluation results

Reliability of temporal measurements of the pharyngeal swallowing reflex

Using VFSS videos of 10 patients with dysphagia, intraclass correlation coefficients ( ICCs ) and 95% confidence intervals ( CIs ) were calculated to assess inter-and intra-rater reliability. For inter-rater reliability, two examiners, blinded to clinical information and measurements made by another examiner, evaluated measurements of the duration of the pharyngeal swallowing reflex at different time points. We report that the results achieved high accuracy in both intra- and inter-rater reliability - intra-rater reliability: ICC = 0.982 (CI: 0.972-0.989) ; inter-rater reliability: ICC = 0.968 (CI: 0.939-0.983)- .

Evaluation results by model

The mean success rates in detecting the class during the swallowing reflex in the training and validation datasets were 98.2% and 97.5%, respectively. The difference between the predicted detection and the truly correct answer at the beginning and end of the swallowing reflex was reported to be 0.210 and 0.056 seconds, respectively. the F1 score (see figure below), depending on the IOU threshold, resulted in the following evaluation: for an IOU threshold of 0.2, the detection F-1 scores were 94. 7% - training and 87.5% - validation; when the IOU threshold was set at 0.4, the F-1 scores were 74. 7% - training -and 67.5% -validation-.


In this study, we proposed a new method to automatically measure the reaction time of the pharyngeal swallowing reflex in VFSS. Specifically, we aim to automatically measure the reaction time through learning by utilizing image analysis techniques on labeled pharyngeal images. The evaluation results were as follows: the average success rate of class detection during the swallowing reflex was 98.2% - training - and 97.5% - validation. It can be inferred that this model will be a useful tool in clinical practice to estimate the absence or delay of the swallowing reflex in patients with dysphagia and to improve the low inter-rater reliability in assessing the reaction time of the pharyngeal swallowing reflex between skilled and unskilled clinicians.

In addition, the response time measurement of the pharyngeal swallowing reflex for this technique was as follows: the difference between the predicted response time of the swallowing reflex and the label was approximately 1 to 2.5 frames (0.067 to 0.167 s).In the VFSS, the normal value of the response time of the swallowing reflex is 0.21 ± 0.26 s in healthy young adults and 0.53 ± 0.64 s in older adults (65 years and older, Therefore, it can be inferred that our predictions were within the standard deviation of the swallowing reflex times in healthy subjects. These results suggest that our method may be useful for diagnosing absent or delayed pharyngeal swallowing reflexes in patients with dysphagia.

On the other hand, there are multiple possible challenges to this study: small sample size; evaluation of reaction times only. First, we evaluated only a small sample size - about 20 samples - which may be unreliable; on the other hand, the evaluation results showed that the very short duration of the pharyngeal swallowing reflex can be measured with high accuracy, and therefore, as a preliminary study This may be useful. Possible solutions to this problem include increasing the sample size and introducing models that can achieve high accuracy with a small number of samples, such as fine-tuning and transfer learning. Second, we analyzed only the reaction time of the pharyngeal swallowing reflex and excluded other spatiotemporal parameters of the oral, pharyngeal, and esophageal phases of the swallowing process. Because it does not evaluate disorders other than the pharyngeal phase, it is unclear whether it is valid for all swallowing disorders; therefore, it is necessary to integrate the proposed method with similar methods of interpreting the VFSS in clinical situations to demonstrate its validity in a broader range.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us