Catch up on the latest AI articles

Facial Recognition Of Japanese Macaques In The Wild! New Animal Behavior Research

Facial Recognition Of Japanese Macaques In The Wild! New Animal Behavior Research

Face Recognition

3 main points
✔️ Identification of wild Japanese macaques using facial recognition technology, achieving over 80% accuracy
✔️ Suggests potential for new research methods in ecology and behavior
✔️ Simple method using existing models and libraries, feasible for researchers with no AI expertise

Deep Learning for Automatic Facial Detection and Recognition in Japanese Macaques: Illuminating Social Networks
written by Julien Paulet (UJM), Axel Molina (ENS-PSL), Benjamin Beltzung (IPHC), Takafumi SuzumuraShinya YamamotoCédric Sueur (IPHC, IUF, ANTHROPO LAB)
(Submitted on 10 Oct 2023)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Social and Information Networks (cs.SI)


The images used in this article are from the paper, the introductory slides, or were created based on them.

First of all

In this paper, we propose a system to identify Japanese macaques using face recognition technology and to evaluate their social network (sociability). In ecology and behavior, it is important to identify individuals in order to understand complex social structures. Until now, most experiments on animals have used methods that make animals uncomfortable and negatively affect them, such as tagging them to identify individual animals. Such methods have also required significant time and effort for researchers.

On the other hand, recent advances in deep learning have led to advances in face recognition technology, which is now being put to practical use in a variety of applications, including security cameras and immigration control. Therefore, in this paper, we conduct an experiment using face recognition technology to identify the faces of Japanese macaques and to construct and evaluate a population network of Japanese macaques.

This is a unique application of face recognition technology to animal behavior studies. This new approach may prove useful in the field of research on tracking individual animals, including Japanese macaques, and the social networks in which they are applied, and may provide an opportunity to expand the scope of future research.

Research and Results

The figure below shows the overall methodology of this paper. Data collection, Face Detection, ID Classification, and Social Network Analysis are performed.

This study is being conducted for two months, from February to March 2023, on a small island called Koujima, about 300 meters off the Nichinan coast near Kushima City, Miyazaki Prefecture. Saiwai Island is surrounded by rocky coast and has a sandy beach called Odomari that stretches 100 meters across the island (Figure A below), and groups of monkeys to be studied gather around the beach every day, especially in the morning (Figure B below). The group of monkeys being trained consists of a total of 42 animals, including 6 females, 11 males, and 5 juvenile monkeys whose sex has not yet been determined.

The island has always been known for its field work, and the monkeys living on the island have names.Researchers go to the island one to three days a week to feed themonkeysand make continuous observations, weighing them regularly on a scale while fishing with food (Figure C below). In addition, because tourists visit the island regularly, the monkeys are relatively accustomed to humans.

We have been photographing the Japanese monkeys on Saiwai Island daily for two months from February to March 2023. Upon arrival on the island, we feed the monkeys immediately and photograph them when they are grooming and resting on the beach and when they are at ease. Various efforts are made to collect as many monkey face images as possible that are suitable for facial recognition, including clear frontal faces and faces from various angles. As a result, 370 videos of about 15 hours were taken throughout the two-month observation period.

After the video was captured, a total of 5,985 frames were extracted and annotated from the video to create training data for the face detection model. In addition, 642 frames extracted from the YouTube video "Japanese Macaques Look Almost Human" at a rate of one frame per second were added to the data, creating a total dataset of 6,622 frames. For annotation, a bounding box labeled "macaque" was assigned to the monkey's face. In addition, data expansion was also performed, ultimatelyincreasing the data volume to a total of17,772 frames.

Face detection uses a convolutional neural network (CNN) model based on deep learning, pre-trained on the COCO dataset and then fine-tuned on the created Japanese monkey dataset. The face detection results are shown in the figure below.

Average Precision achieves 82.2% when the IoU is 0.5, which measures how much the frame that indicates the model's predicted facial location (the detection box) overlaps with the human manually specified frame (the annotated box).

Next, we discuss individual identification. Here, to create the training data, the initial dataset of 5,985 frames was randomly reduced by half to prevent the model from over-adapting to similar images. We also manually selected 1,210 new frames from various videos and added them to the dataset to include new images with a variety of facial expressions, backgrounds, etc. Using a tool called Roboflow, the dataset was annotated in the same way as the existing dataset for face detection, with all bounding boxes (frames indicating the location of the monkey's face in the image) were manually labeled with the name of the corresponding monkey and given 42 different classes (individual names). Data expansion was added to create a final dataset consisting of 5,956 images. A convolutional neural network (CNN) model is trained on this dataset.

The figure below evaluates how accurately this model identifies each individual.

The Top 1 Accuracy (left side of the figure) shows that the AI's most likely predicted name is matched as the actual individual's name with an accuracy of over 80%. The Top5 Accuracy (right side of the figure) shows that the accuracy of the AI's top five predictions including the correct individual's name exceeds 90%.

Finally, there is the social network analysis. Our goal is to automatically generate network diagrams in the future, but in this case, we are performing the calculations manually for comparison purposes. The initial stage of analysis here is to analyze the co-occurrence between individuals captured in a particular environment, calculate the degree of association between individuals, and represent this as a social network. The degree of association between two individuals is the probability of how often one individual appears in the same video as another. It is used as a measure of physical proximity. In other words, the numerical value of how often two individuals are together is used to determine their relatedness. The calculation uses the "simple ratio" method, which is obtained by dividing the number of times two individuals appear together by the total number of occurrences of both individuals in the data set.

In addition, two metrics are computed to understand the social connectedness of the group as a whole. One is network density, which is the ratio of the actual links (connections) present in the network to the total number of theoretically possible links. This gives a numerical indication of how closely connected the network is. The second is global efficiency. This is a measure of how fast information travels through the network, measured in terms of the minimum number of connections. Higher efficiency means that information spreads more quickly with fewer connections. For each individual, we also calculate "frequency" (how many other individuals the individual is related to), "intensity" (the sum of all the relatedness indices the individual has), and sociability values such as "eigenvector centrality" (how central the individual is in the network). And finally, to visually represent this information, a software package called "igraph" is used to depict the social network of the population. This graph uses a layout algorithm called GEM (graph embedding) to spatially show the relationships between individuals in the network (see figure below).

A total of 276 bilateral co-occurrences between individuals were observed in the analysis. The resulting association network scores a density of 0.173 and a global efficiency of 0.508. The individuals with the highest frequencies and intensities are mostly young individuals (detailed results can be found in the Supplementary Material of the paper). The results suggest the usefulness of social network analysis using face recognition.


In this paper, we develop an AI pipeline to automate face detection and identification of Japanese macaques. It also recognizes Japanese macaque faces from videos and builds a social network based on this data. Because it uses existing models and libraries and is implemented in a simplified manner, it has the potential to be of great benefit to researchers without expertise in AI.

In the future, there are plans to use this pipeline to fine-tune and improve it so that it can be applied not only to the Japanese macaques of Kohshima, but also to other populations to contribute to long-term research. This tool will allow comparative studies of social dynamics among different Japanese macaque populations, and is expected to contribute to a wider range of research, including studies of cultural diversity among Japanese macaques.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us