Exploring Facial Expression Recognition Techniques For The Intellectually Disabled Using The MuDERI Dataset

Face Recognition 07/03/2024

3 main points
✔️ Importance and challenges of application to people with intellectual disabilities: research how facial expression recognition technology can be used to accurately understand the emotional states of people with intellectual disabilities and improve their communication skills.
✔️ Training and Analyzing Deep Learning Models: Using a specific dataset that includes people with intellectual disabilities (the MuDERI dataset), test how accurately it can predict facial expressions of people with intellectual disabilities.
✔️ Insights into dataset and model applicability: show that the general FER dataset does not fully capture the characteristics of people with intellectual disabilities and that there are marked differences in which areas of the face the model focuses on depending on the presence or absence of intellectual disability. Suggests that the lack of data specific to people with intellectual disabilities needs to be addressed to develop more accurate facial expression recognition techniques.

Evaluating the Feasibility of Standard Facial Expression Recognition in Individuals with Moderate to Severe Intellectual Disabilities
written by F. Xavier Gaya-Morey, Silvia Ramis, Jose M. Buades-Rubio, Cristina Manresa-Yee
(Submitted on 22 Jan 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Understanding the emotional states of people with intellectual disabilities is critical not only to improving the quality of life of individuals, but also to enhancing communication and mutual understanding throughout society. With this importance in mind, this dissertation focuses on ways to maximize the potential of facial expression recognition (FER) technology to enhance the communication skills of people with intellectual disabilities. Using deep learning, which has achieved remarkable success in facial expression identification and analysis, we aim to answer the following key questions.

Do deep learning models trained on standard datasets perform well for persons with intellectual disabilities?
Can a model trained on a dataset that includes a person with an intellectual disability accurately predict the facial expressions of another person with an intellectual disability?
Are there differences and similarities in facial expressions between people with and without intellectual disabilities?

To answer these questions, we train and test multiple neural networks on a dataset that does not include persons with intellectual disabilities and a MuDERI dataset that includes persons with intellectual disabilities. Finally, we use Explainable Artificial Intelligence (XAI) techniques to analyze and visualize which areas of the face each model deems important.

This paper delves deeply into the applicability of FER technology to people with intellectual disabilities and aims to make an innovative contribution in this area. It also proposes a comprehensive methodology, ranging from curation of datasets, model selection, data preprocessing, and an overview of XAI strategies. Through this research, we expect to contribute not only to improving the quality of life of people with intellectual disabilities, but also to further advance FER technology.

Proposed Method

The dataset used in this paper consists of seven datasets that are useful in facial expression recognition (FER). The four basic datasets are widely known as standard benchmarks in FER research, namely

Extended Cohn-Kanade (CK+): includes 593 facial expression sequences by 123 participants, labeled with seven facial expressions (anger, contempt, disgust, fear, happiness, sadness, surprise).
BU-4DFE: 606 facial expression sequences from 101 participants, each containing 6 facial expressions.
JAFFE: 213 facial images by 10 Japanese actresses, including 6 expressions.
WSEFEP: 210 images from 30 participants, including 7 facial expression categories similar to the JAFFE dataset.

In addition, the following three data sets have been added

FEGA: Includes multiple facial expression sequences by 51 participants, multi-labeled for facial expression, gender, and age.
FEtest: Consists of 210 frontal images taken under natural conditions.
MuDERI: a multimodal dataset consisting of 12 participants with intellectual disabilities, including audiovisual recordings to elicit positive and negative emotions.

By utilizing such diverse datasets, we are building a dataset that can handle a wide range of conditions, from basic facial expressions to emotion recognition under special conditions. This represents an important step forward in the development and evaluation of FER technology.

The study also employs 12 different networks to improve the accuracy of facial expression recognition (FER). These models aim to raise the accuracy and efficiency of facial expression recognition by combining general architecture with proprietary models designed specifically for FER. The models used include.

General Architecture
- AlexNet, VGG16, VGG19: Combines multiple convolutional and pooling layers to extract advanced features from images; VGG models are renowned for their simple but uniform architecture.
- ResNet50, ResNet101V2: Deep network structure, but uses residual connections to address the vanishing gradient problem.
- InceptionV3: Efficiently captures features by applying filters of various sizes simultaneously.
- Xception: uses depth-separable convolution to increase computational efficiency.
- MobileNetV3: Optimized for mobile devices, providing high performance at low cost.
- EfficientNetV2: Scaling strategy enables efficient performance improvement.
Architecture dedicated to FER
- SilNet, SongNet, and WeiNet: Designed specifically for FER, they provide relatively simple yet effective facial expression recognition.

These models are trained on multiple datasets and preprocessed to capture different aspects of facial expressions. By comparing the performance of these diverse architectures, the research team investigates how the choice of architecture affects results in the facial expression recognition task.

Experiment

Three experiments are conducted here: the first is a performance evaluation with models trained on the FER dataset, to assess whether various networks trained on an extended dataset designed for the FER task can accurately classify the facial expressions of persons with intellectual disabilities in the MuDERI dataset The objective is to.

As a visualization of the experimental results, the following figure shows the results of a study using the FER-DB5 dataset. This figure is a boxplot showing the distribution of results for the different learning sessions, with the median, interquartile range, and outliers clearly represented.

On the MuDERI dataset, the accuracy is lower than 55%, with unsatisfactory results across networks. On the other hand, Google FEtest achieves more than 80% accuracy for most networks, with all networks performing well except ResNet50 in particular. The results also show that training with MuDERI results in more variability in accuracy compared to FER-DB5.

The second is the result of training with MuDERI. The goal is to evaluate whether a model trained on a dataset containing persons with intellectual disabilities can accurately predict the facial expressions of other persons with intellectual disabilities. The results of the analysis are shown in the figure below.

The following four scenarios are examined here

Split user base: split a portion of MuDERI, training with some users and evaluating with the rest.
Clip-based segmentation: segmentation is done by clips, ensuring that the model "sees" all users during training and is fully exposed to user-specific facial expressions.
Clip-based split: similar to the second scenario, but with an additional constraint. Clips from users with only one clip for a particular class were only included in the training set, not the test set. This was intended to assess the model's ability to recognize a user's representation only when it encounters other clips of the same user and class during training.
Frame-based segmentation: segmentation is done by frame, with adjacent frames randomly classified into subsets of the training test.

User-based segmentation shows the poorest results, clip-based segmentation achieves similar accuracy, and frame-based segmentation achieves the best accuracy. On a per-network basis, results vary by training scenario, with EfficientNetV2 showing the best performance in the first scenario, but worse performance in the fourth scenario; MobileNetV3 consistently shows the lowest results; and EfficientNetV2 shows the best performance in the second scenario, but worse performance in the third scenario.

The third objective is to assess whether there are differences and similarities in facial expressions between people with and without intellectual disabilities; heatmaps obtained through training and testing in FER-DB5 and MuDERI have been created and analyzed. The heatmap is shown in the figure below.

Comparing rows 1 and 2 for Sadness, Happiness, and Anger, there is a clear difference in trend in row 3 (MuDERI), with a marked difference in which areas of the face the model focuses on, depending on whether the person has an intellectual disability, with more complex and and counterintuitive areas are relevant for people with disabilities.

Summary

This paper focuses on the application and challenges of facial expression recognition (FER) technology to individuals with moderate to severe intellectual disabilities. Specifically, 12 deep learning models were trained on a variety of datasets, including the MuDERI dataset configured specifically for people with intellectual disabilities. The process leverages Explainable Artificial Intelligence (XAI) techniques to investigate how the models interpret the facial expressions of different user groups.

The study also shows that the general FER dataset does not adequately capture the characteristics of persons with intellectual disabilities, and that it is essential to study this user group directly. Depending on the presence or absence of intellectual disability, there were marked differences in which areas of the face the model focused on, with more complex and counter-intuitive areas being relevant for people with disabilities.

Future research could address the challenge of the lack of FER data specific to people with intellectual disabilities and seek to develop more comprehensive and accurate facial expression recognition techniques. Data enrichment in this area is critical to increasing the effectiveness of deep learning methods and making the technology more equitable and accessible.

Categories related to this article

Takumu: I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

Exploring Facial Expression Recognition Techniques For The Intellectually Disabled Using The MuDERI Dataset

Summary

Proposed Method

Experiment

Summary

AVI-Talking" Generates Natural 3D Talking Faces From Audio

AVI-Talking" Generates Natural 3D Talking Faces From Audio

Diffusion Facial Forgery (DiFF), A New Large-scale Dataset For Face Forgery Detection

Diffusion Facial Forgery (DiFF), A New Large-scale Dataset For Face Forgery Detection

IdentiFace: A Multimodal Face Recognition System That Captures Everything From Emotion To Gender And Its Potential

IdentiFace: A Multimodal Face Recognition System That Captures Everything From Emotion To Gender And ...

How Do Duplicate Images Affect Face Recognition Performance? The Importance Of De-duplication In Face Image Datasets

How Do Duplicate Images Affect Face Recognition Performance? The Importance Of De-duplication In Fac ...

Multi-tasking Face (MTF), A New Facial Image Dataset That Respects Privacy And Can Be Used For Multiple Tasks

Multi-tasking Face (MTF), A New Facial Image Dataset That Respects Privacy And Can Be Used For Multi ...

FRCSyn Challenge Shows Potential For Face Recognition Technology With Synthetic Datasets (FRCSyn Challenge At WACV 2024: )

FRCSyn Challenge Shows Potential For Face Recognition Technology With Synthetic Datasets (FRCSyn Cha ...