Catch up on the latest AI articles

Transformer Predicts Age!

Transformer Predicts Age!


3 main points
✔️ Study on age prediction from brain MRI
✔️ Propose a Global-Local Transformer suitable for MRI of the brain
✔️ Achieve age prediction with higher accuracy than conventional methods

Global-Local Transformer for Brain Age Estimation
written by Minghao ChenHouwen PengJianlong FuHaibin Ling
(Submitted on 1 Jul 2021)
Comments: Published on arxiv.

Subjects: Computer Vision and Pattern Recognition (cs.CV)


The images used in this article are from the paper, the introductory slides, or were created based on them.


Deep learning is a rapidly developing field, and its effects are also being seen in the field of healthcare. One such example is brain age estimation, which predicts a person's age based on an MRI of the brain.

It is said that the difference between the age-predicted Brain age estimation and the actual age is related to the health of a person's brain, and is an indicator to measure health status. However, in conventional models, features are extracted only from the entire MRI of the brain. We have not been able to take into account even the detailed features included in MRI.

In this paper, we propose a model to extract not only the features of the whole image but also the detailed features, called Global-Local Transformer features in addition to the overall image features.

We compare the Global-Local Transformer with conventional methods and find that the Global-Local Transformer achieves a more accurate age estimation than conventional methods. In addition, we discuss which part of the MRI is focused on by the Global-Local Transformer for the Brain age estimation.

In this article, we will explain the Global-Local Transformer, conduct comparison experiments with conventional methods, and present the results of visualizing the points of interest in the model.

Proposed method: global-local Transformer

The whole proposed method is shown in the above figure. As input data, we use the whole MRI image (top) and a patch image randomly cut from the MRI (bottom). The flow of the proposed method is divided into Backbone and Global-Local Transformer.


Backbone performs CNN feature extraction on the whole image and patch image. The architecture of CNN is based on convolutional layer, Batch Normalization, ReLU, and Max pooling as shown in the figure below.

Global-Local Transformer

The Global-Local Transformer performs age prediction using the whole MRI image and patch image features extracted by Backbone as input.

The difference between Global-Local Transformer and the original Transformer is that instead of Self-Attention, the Global-Local Attention proposed in this paper is used. Also, Global-Local Transformer does not use Layer Normalization, and the features extracted from the patch image are combined with the output of Global-Local Attention.

Global-Local Attention, which is the crux of this paper, is shown in the figure below.

Global-Local Attention uses features from the patch image for query and features from the whole image for key and value. In this way, we can combine the fine-grained features in the MRI with the features of the whole MRI.

In addition, for the experiments in the next section, we use a model stacked with six Global-Local Transformers to account for accuracy and prediction time. (For the second and subsequent Global-Local Transformers, we use the output of the previous Global-Local Transformer as a query.)

Brain age estimation


To evaluate the model, we use the eight brain MRI datasets shown in the table above.

N_samples is the number of data, Arge range is the age range, and Gender is the gender breakdown.

In this article, we also present only the experimental results of 5-fold-validation with the combined dataset of the top six datasets.

experimental results

The MAE between the predicted and actual age is used as the loss function during training.

The MAE and correlation coefficient (Pearson Correlation) between predicted and actual age and CS (α=5) are used as evaluation indices. CS (α=5) represents the percentage of the difference between predicted and actual age that is less than 5 years.

The results of the 5-fold-validation are shown in the table above. The top eight models are used for image recognition and the bottom eight models are for age prediction including the proposed method. The proposed method outperforms the previous methods in all the metrics of MAE, correlation coefficient, and CS (α=5).


The following figure shows a heatmap visualizing which part of the brain MRI is focused on by the proposed method for brain age estimation.

From the above figure, we can see that all brain MRIs focus on almost the same region (red area). This indicates that certain regions contain information that is important for brain age estimation.

Furthermore, if we visualize the areas of interest by age, the figure below shows the results. In the figure below, the numbers below the heatmap indicate the age of the respondents.

From the above figure, we can see that the focus of attention differs depending on the age of the subjects. For example, the age group from 0 to 5 years old focuses on the frontal lobe (the upper left part of the heatmap), while the age group from 30 to 35 years old focuses on the parietal lobe (the lower right part of the heatmap).


In this paper, we introduced the Global-Local Transformer, a model for predicting age from MRI of the brain, and by incorporating a method for extracting detailed features in the MRI into the Transformer, we achieved an accuracy of age prediction that exceeds that of conventional methods. In this paper, some issues regarding this study were also raised, such as "the dataset does not include data of prevalent patients" and "there is a bias in the age of patients in the dataset".

By solving these issues, we hope to make further progress in Brain age estimation.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us