Catch up on the latest AI articles

ArtEmis V2.0 Is Now Available, Removing The Emotional Bias In The Painting Dataset!

ArtEmis V2.0 Is Now Available, Removing The Emotional Bias In The Painting Dataset!


3 main points
✔️ Identify biases in the distribution of emotions and captions due to emotional biases that occur in the Artemis collection process
✔️ Created ArtEmis v2.0 with a contrasting data collection method to remove these emotional biases
✔️ Higher quality caption generation compared to ArtEmis by using complementary datasets obtained by this method

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
written by Youssef MohamedFaizan Farooq KhanKilichbek HaydarovMohamed Elhoseiny
(Submitted on 15 Apr 2022)
Comments: CVPR2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)


The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

Emotions play a central role in determining human mental states and behavior, and modeling these emotions is essential for improving social acceptance in applications and interactive AI.

However, in the modern era, data sets that capture the relationship between vision, language, and emotion are very limited, and this is one of the reasons why our understanding of human emotional traits has not progressed.

In recent years, ArtEmis has been proposed as a large-scale dataset annotated with emotional responses to visual art and linguistic explanations for these emotions.

The paper presented in this paper is an attempt to identify the affective biases and proposes a contrastive data collection method to remove these biases. the ArtEMis and proposes a contrastive data collection method to remove these biases.

Emotional Bias in ArtEmis

Social psychologist Plus argues that bias is a way for humans to optimize brain function without paying attention, and since humans will be labeling the data when creating the dataset, it is inevitable that the collected data will contain bias.

These biases are often mild but can develop into social problems, especially in applications used for ethical decisions and human interactions.

The recently proposed ArtEmis, a large dataset annotating emotions towards visual art, also contains such biases, and the author of this paper found a bias in the distribution of paintings and their corresponding emotions and captions.

ArtEmis's caption consists of four types of Positive emotions, 'Amusement', ' Awe ', 'Contentment' and 'Excitement', four types of Negative emotions, 'Anger', 'Disgust', 'Fear' and 'Sadness ', and 'Something Else'. The ratio of Positive emotion is 62% and negative emotion is 26%, which means that the distribution of the emotions is highly uneven and lacks diversity.

To improve such emotion distribution bias and remove emotion bias, this paper proposed to collect complementary datasets through contrastive data collection methods.

The contrastive data collection interface

Next, we describe the contrasting data collection methods proposed in this paper.

A major problem with existing ArtEmis is that it only gives similar captions for paintings of similar style. (This is also why the Nearest-Neighbor model, which extracts the painting data closest to the test data from the training data, performs abnormally well in the experiments in the original ArtEmis paper.)

Therefore, in this paper, we proposed a contrastive dataset collection method to remove the emotion bias for such nearby painting data and to create a dataset that includes more diverse emotion captions.

The data collection interface of this paper is shown in the figure below.

First, given a random painting and a list of its emotions as shown in Figure (a), the subject selects the most appropriate painting from the 24 similar paintings below it that reads the opposite emotion to the given painting (if no appropriate painting is found, the subject selects "No Image Available" to avoid the emotion bias). (If no suitable painting is found, the subject selects "No Image Available" to avoid emotional bias.)

You are then asked to annotate your feelings about the selected painting and describe why you felt that way, as shown in Figures (b) and (c).

This interface allows the user to select the opposite sentiment to the existing annotation, as shown in the figure below, which is an improvement over the problem in ArtEmis of only giving similar captions for paintings of similar style. The problem of ArtEmis, which only gives similar captions for paintings of similar style, has been remedied. similar style paintings.

In this paper, we identified a total of 52933 emotion-biased paintings for ArtEmis, all of which were annotated with the above interface for at least five people, for a total of 260533 instances collected. (Of these, 7752 were "No Image Available").

The existing ArtEmis captions were The existing ArtEmis captions had a large distribution bias of 62% positive and26% negative sentiment, but a new dataset obtained by combining the complementary dataset collected by the above interface (hereafter referred to as the Contrastive dataset) with ArtEmis (hereafter referred to as the Combined dataset ), we found that Positive emotions were 47 Positive emotions are 47%, while negative emotions (45%), resulting in a very balanced distribution.

Qualitative analysis

The figure below shows a sample Constrastive dataset, with a random painting on the left and the most appropriate painting that reads the opposite sentiment for that painting on the right, along with the existing caption on the bottom and the caption from our method on the top. (Two sets of one pair on the left and one pair on the right)

As can be seen in the figure, the existing captions were simple and did not contain many emotional expressions. However, the caption using this method imposed the constraint of choosing a painting that could read the opposite emotion, and the subjects tended to pay more attention to the details of the painting and use more emotional expressions.

Quantitative analysis

The following figures show the emotion distribution in each dataset, and it can be confirmed that the combination of the existing ArtEmis dataset and the complementary dataset by our method as mentioned above results in a very well-balanced emotion distribution.

In this paper, we also investigate the correlation between the distribution for each emotion and the emotion according to the semantic space theory for the Combined dataset and existing ArtEmis, as shown in the figure below.

The darker the color of the patch, the lower the correlation between different emotions. The results of this analysis show that the Combined dataset is less correlated with each emotion than ArtEmis and that it represents each emotion.


In this paper, based on the existing studies, the following model was used in the experiments.

  • Nearest-Neighbor (NN), which extracts the nearest neighbor data to the test data from the training data
  • Show-Attend-Tell (SAT) with LSTM and Meshed-Memory Transformers
  • Meshed-Memory Transformer ( M2) that replaces the recursive structure with a transformer and uses the bounding box computed separately in the CNN.
  • M above.twomodifiedMega 2
    • The usual M2 uses object features as image representations. Still, some of the ArtEmis paintings may not be suitable for paintings because they do not contain objects (e.g. abstract paintings), so the modified M2 model can extract patch features from paintings by dividing the painting into P × P patches (P = 4 in this experiment). P × P patches (P = 4 in this experiment).

The table below shows the results of training these models on the Combined dataset.

It is worth noting here that the Nearest-Neighbor (NN) performance is the lowest compared to the existing training results using ArtEmis, which indicates that the NN model does not perform well on the Combined dataset due to the elimination of the sentiment bias for annotations. the NN model no longer performs well due to the elimination of the sentiment bias for annotations in the Combined dataset.

In addition, modifiedM2 is the same as the existingM2 slightly exceeded that of the existing M2, confirming the author's hypothesis that " extracting features using only the This result confirms the author's hypothesis that the feature extraction using only bounding boxes is not suitable for painting.

Also, a sample caption generated using the Combined dataset and SAT is shown in the figure below. (Top: caption generation only Bottom: emotion-based caption generation)

Compared to the generation on the existing ArtEmis dataset, the models trained on the Combined dataset were found to generate higher-quality captions that captured the features of the paintings.


How was it? In this issue. identified biases in the distribution of emotions and captions due to emotional biases that occur in the ArtEmis collection process, and proposed a contrastive data collection method to remove these emotional biases.

Although this paper focused only on the emotional bias, there may be other biases in ArtEmis that have not yet been addressed, such as bias against race and ethnicity, and therefore future developments will be of interest.

You can find samples of the datasets and generated captions introduced in this article in this paper if you are interested.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us