Catch up on the latest AI articles

PAM: Predictive Model For Aesthetic Evaluation Of Personalized Images

PAM: Predictive Model For Aesthetic Evaluation Of Personalized Images


3 main points
✔️ addressed the problem of aesthetic evaluation of personalized images
✔️ collected a new dataset and built a new difference-based prediction model
✔️ proposed an active learning algorithm for aesthetic evaluation learning for real-world applications

Personalized Image Aesthetics
written by Jian Ren, Xiaohui Shen, Zhe Lin, Radomir Mech, David J. Foran
(Submitted on 25 December 2017)
Comments: Published in 2017 IEEE International Conference on Computer Vision (ICCV)

The images used in this article are from the paper, the introductory slides, or were created based on them.


Aesthetic evaluation of images has a variety of potential applications, including image retrieval, photo ranking, and curation of personal albums. This task is a challenging problem that requires a sophisticated understanding of photographic attributes and semantics in images. Although recent developments in deep learning have finally allowed us to make significant progress, much of the work has been limited to classifying general aesthetic evaluations of the masses, and little research has been done on predicting individual-specific aesthetic evaluations.

This paper addresses this problem of predicting the aesthetic evaluation of personalized images, which we call personalized image aesthetics.



We downloaded 4,000 Creative Commons licensed photos from the Flickr website, collected data from 210 annotators on Amazon Mechanical Turk, and asked them to give 5 ratings for each image. We asked them to rate the aesthetics on a 5-point scale from 1 to 5.

The resulting dataset is divided into a training set and a validation set. (In doing so, the annotator is not included across the training and validation sets.)


An additional dataset consisting of 14 individual album photos and each owner's own evaluation was also collected. For each individual, there are about 200 photos in the dataset.

Analysis of individual user preferences

FLICKR-AES is used to investigate the correlation between individual user ratings and various image characteristics. We focus on image content attributes (semantic categories) and aesthetic attributes (symmetry, etc.) as factors that influence the aesthetic evaluation of images.

First, 111 annotators are selected from the training data to investigate the relationship between their preferences and the content and aesthetic attributes of the images using Spearman's correlation coefficient. The average score across subjects for the aesthetic evaluation is taken as the ground truth, and the amount of offset (residual, difference) to it is used to calculate the correlation with each attribute.

The figure above shows the relationship between each attribute and the aesthetic evaluation of eight randomly selected annotators. From this figure, we can see that each individual has a different preferred attribute.

Random1 and Random2 represent two hypothetical annotators that were created by randomly selecting 1000 images from the training set and using the average of the annotator's ratings (ground truth) for each of them. Random1 and Random2 preferences were sampled as perturbed by 0.2 and 2 standard deviations from the ground truth, respectively, to compute the correlation between the aesthetic evaluation and the attributes of the image. It can be seen from these results that there is almost no correlation between the average aesthetic evaluation and each attribute, whereas there is a strong correlation within each user.

Personalized Aesthetic Assessment Model PAM

Model Structure

The overall structure is as follows

First, we train a model (Generic Aesthetics Network) that predicts the ground truth of the entire user set using a deep neural network.

Next, we compute the difference (offset) of each user's aesthetic evaluation relative to the ground truth. The goal is to train a regression model that predicts this difference for the image input.

Here, the number of data for the evaluation scores for each user is not very large, and we imagine that it would be difficult to perform regression prediction directly from the images. Therefore, we train separate neural networks (Attributes network, Contents network) to predict aesthetic and content attributes for each image input, combine their outputs, and use support vector regression techniques to predict differences.

Active learning algorithm

If we were to create an application that curates photos by learning user preferences in the real world, a system that actively learns user preferences would be effective. For this purpose, we will use a method called active learning to effectively collect user preferences.

To minimize the number of photos to be evaluated by the user, we set the following two criteria

(1) Select photos that cover as wide a range of styles as possible with a minimum of redundancy

(2) Images with a large offset score between the user's evaluation and ground truth are selected as relatively informative.

The algorithm is as follows

For an image $p_i$ that has not yet been annotated with user ratings, let $v_i$ be the combined output of the Attributes network (Contents network) and $r_i$ be the difference ( offset) predicted by support vector regression using it is $r_i$. Similarly, compute $v_j$ and $r_j$ from the already annotated image $p_j$.

Under this, the algorithm above selects the $p_i$ with the largest sum of distances to $v_j$ to take into account criterion (1) (Equation 3). In order to consider criterion (2), we weight $p_i$ by the magnitude of the offset (Equation 2).

Experimental results

Predictive performance of PAM model

The following table compares the results of the proposed method (PAM) with those of the method that predicts aesthetic ratings directly from images using support vector regression and the previous study Feature-based Matrix Factorization (FPMF). The table is divided into left and right columns for two different numbers of training data (10 and 100) used to match the individual aesthetic evaluation. Each value represents the improvement from a correlation coefficient of 0.514 between the predictions and the correct answers of the model trained on the ground truth data.

This shows that the proposed method is able to predict an individual's aesthetic evaluation with higher accuracy than its predecessors. It also shows that it is better to use both aesthetic attribute prediction and content prediction of images for the prediction of personal aesthetic evaluation.

Active learning algorithm

The performance of the proposed algorithm for active learning is as follows

The red line in the graph shows the proposed active learning method, indicating the high prediction performance of the proposed method on both FLICKR -AES and REAL-CUR datasets.


The paper presented here proposed a dataset, a model, and an active learning algorithm for predicting an individual's aesthetic evaluation. It shows that the method of calculating the difference from a generic model for predicting aesthetic evaluation using a small number of individuals' data is effective. Further development in this field is expected in the future.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us