Catch up on the latest AI articles

Wild Selfie Dataset (WSD), A Dataset For Facial Recognition Of Selfie Images

Wild Selfie Dataset (WSD), A Dataset For Facial Recognition Of Selfie Images

Face Recognition

3 main points
✔️ Proposed a new dataset "WSD" for face recognition in selfie images
✔️ WSD builds a dataset with high diversity in real scenarios such as lighting, viewpoint, blurring, reflection, etc.
✔️ Comparative validation with existing datasets in face detection and face recognition tasks

WSD: Wild Selfie Dataset for Face Recognition in Selfie Images
written by Laxman Kumarapu, Shiv Ram Dubey, Snehasis Mukherjee, Parkhi Mohan, Sree Pragna Vinnakoti, Subhash Karthikeya
(Submitted on 14 Feb 2023)
Subjects: Computer Vision and Pattern Recognition (cs.CV)


The images used in this article are from the paper, the introductory slides, or were created based on them.


 Recently, selfies are increasingly being requested due to the proliferation of online identity verification and other factors. However, since selfies are taken in close proximity to the camera, the face is often larger than in ordinary images, and effects are often added by processing applications.

 Therefore, this paper proposes a new dataset, Wild Selfie Dataset (WSD), which is specialized in face recognition in selfie images. unlike conventional datasets, WSD takes into account real scenarios and various conditions such as processing, distortion, blurring, and backlighting.

What is WSD (Wild Selfie Dataset)?

 The Wild Selfie Dataset (WSD)is a dataset for face recognition using selfie face images, consisting of 45,424 face images from 42 individuals (24 females and 18 males ) between the ages of 18 and 31. Of these, 40,862 are training data and 4,562 are test data.

  The images are prepared in a very wide variety of conditions to reproduce real-life scenarios of selfies. The figure below shows a sample WSD, which includes images with AR filters, reflections, blurred images, partially hidden faces, different lighting conditions, different scales, different facial expressions, different alignments, different camera viewpoints, different aspect ratios, etc. The following are included in the list.

 The average number of images for each person is 1,082, with minimum and maximum numbers of 518 and 2,634. The figures below show the distribution of the number of images per subject for the training data (left)and the test data (right). You can see that the training and test data are divided by the same distribution.

 The images collected in the dataset are video images submitted by project collaborators who took selfies. Selfie images were taken with the front or rear camera of a smartphone using a selfie stick or similar device, or with a laptop camera. A selfie video is a video image taken with the front-facing camera of a smartphone. An agreement has been entered into with the collaborator whereby they agree to use the images for non-profit research and development purposes.

 After collecting the self-shot video images, unsupported file formats and corrupted video images were deleted. In addition, the video images were extracted by splitting the frames using the multimedia framework FFmpeg. Various numbers of images are extracted for each facial expression, lighting situation, background, etc. All data is in image format. In addition, the images are checked for matches between pixels to eliminate duplicate images.

 Next, the images are annotated for use in face detection and face recognition tasks. First, for face detection, Dlib is used to detect faces and obtain the upper left and lower right coordinates of the face bounding box. The coordinates are used to calculate the width and height of the bounding box, and the final bounding box annotation includes the upper left coordinates (X, Y), width (W), and height (H). However, since the bounding boxes it is contains bounding boxes that do not contain any faces, overlap other bounding boxes by a large amount, or contain faces that are not detected, we have also manually checked and corrected them. We also assigned 01 through 42 to all 42 individuals who helped with data collection for use in the face recognition task.

 Finally, the data distribution is examined using head posture estimation to analyze the camera motion. The orientation of the head is determined by the position and alignment of the face. The camera rotates around the X, Y, and Zaxes. There are three corresponding angles, Yaw, Pitch, and Roll, respectively. openCV and six important landmarks (the left edge of the left eye, the right edge of the right eye, the left edge of the mouth, the right edge of the mouth, the center tip of the nose, center tip of the chin) are used to estimate the head orientation in the image. the data distribution for the WSD is shown in the figure below.

Comparison with existing data sets

 The table below compares the types of data included in the W SD to existing datasets, showing that the WSD is the only dataset that includes blurred images, mirror reflections, and selfies with AR filters. Compared to the existing datasets, we see that it is a highly diverse dataset.

 The table below also compares the number of subjects, number of images, public/private, and annotation content between the WSD and existing datasets; the WSD has fewer subjects and samples, but it collects images taken by the subjects themselves in an unconstrained environment (similar to a real scenario ). On the other hand, existing datasets are mostly images crawled from the Internet in limited conditions.

 In addition, WSD has obtained prior consent for research use and is the only publicly available data set.

Face Detection Performance Comparison

 The table below compares face detection performance (mAP) on WSD and existing datasets using YOLOv3 and MTCNN; WSDshowshighervalues for both models compared to FDDB and Wider Face.

 This is attributed to differences in the nature of the images in the WSD and existing datasets. Since selfie images are taken at a close distance, the proportion of faces in the image is large, and thus are considered easier to detect than non-selfie datasets. The fact that most of the theWSDsconsist of images with only one face in them is also considered to be a reason why they are easier to detect than existing non-selfie datasets.

 However, it seems that the system does not detect well when the face is not clearly visible or is strongly backlit, as shown in the figure below. We have also reported many cases of incorrect detection even in images containing AR filters. In particular, the detection accuracy seems to decrease when there are obstructions near the eyes. Dealing with cases where non-face objects are included in the bounding box is also an issue in the face detection task for selfie images.

Face Recognition Performance Comparison

  The table below shows the results of comparing face recognition performance between WSD and existing datasets using VGGFace, VGGFace2, and FaceNet. The table below shows that the performance of WSD is significantly lower when compared to existing datasets.

This is because WSD consists of data tailored to real-world scenarios, such as lighting conditions, AR filters, shielding, scale changes, blurring, and variations in face orientation, making it more diverse than existing data sets. Note that the figure below shows a case where recognition failed for all face recognition models.


 This paper proposes a new dataset, the WildSelfie Dataset(WSD), for face recognition using selfie images, which are becoming increasingly popular. It is a dataset with more diversity than existing datasets, as it contains images under various conditions (AR filters, Mira reflections, blurring, shielding, illumination changes, scaling, etc.)considering real-world scenarios.

 We also use WSD to evaluate the performance of face detection and face recognition. For face detection, YOLOv3 and MTCNN are used and show high performance (map). However, we found that there are challenges with severe illumination changes and obstructions. In face recognition, on the other hand, performance was evaluated using VGGFace, VGGFace2, and FaceNet, which showed significantly lower accuracy than existing datasets. We attribute this to the fact that many of the images were taken and processed under uncontrolled conditions, taking into account the real-world scenario of WSD.

 This dataset may be useful in the future to build a more accurate face recognition model that takes into account processing, camera shake, backlighting, and other factors that are considered challenges in face recognition of selfie images, and that are in line with real-world scenarios.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us