Interesting Discovery: Blind AI Learns To Map Its Environment

Reinforcement Learning 31/05/2024

3 main points
✔️ Even blind animals such as mole rats have been reported to construct maps of their environment and select appropriate routes
✔️ Similarly, investigate whether a blind artificial intelligence given only a GPS and compass can acquire a map of its environment and select appropriate routes
✔️ A variety of experiments show that a map is created in the memory of a blind artificial intelligence

EMERGENCE OF MAPS IN THE MEMORIES OF BLIND NAVIGATION AGENTS
written by Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra
(Submitted on 30 Jan 2023)
Comments: Accepted to ICLR 2023
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

Intelligent animals such as hamsters, wolves, chimpanzees, and bats have been reported to learn maps of their environment and select appropriate pathways for action.

Therefore, it is considered important for the robot to locate itself and map its environment in order to act intelligently.

On the other hand, when machine learning, such as neural networks, is used to solve the task of moving toward a certain goal, the goal can be successfully reached without any explicit mapping mechanism.

This mechanism has yet to be elucidated, and understanding this mechanism is important from both academic and practical perspectives.

Therefore, the paper presented here investigates whether the mapping mechanism is an emergent phenomenon.

The novelty of this paper lies in its focus on whether AI can naturally learn a map (i.e., whether map emergence occurs) by simply rewarding AI based on progress to the goal point under very strict conditions of blindness, and in proposing diverse and effective experiments to clarify this point. The findings of this paper are as follows.

Remarkable ability to plan actions: reach goals remarkably well in new environments (about 95% success rate)
Long-term memory utilization: memorizing approximately 1,000 steps of past experiences within an episode
Manifestation of intelligent behavior: learning shortcuts
Expression of maps of the environment and collision detection neurons: AI's learned internal representations suggest maps of the environment and collision detection neurons.
Selective, task-dependent maps: forget exploratory detours

This paper has been selected as the Outstanding Paper of ICLR2023.

Remarkable ability to plan actions

A 3D replica of a real house was used to evaluate the performance of the AI's action planning task. A cylinder 0.2 m in diameter and 1.5 m high is considered as the physical entity of the AI and is simulated to be moved inside the replica house.

Problem-solving

Each episode, the AI's environment is randomly initialized. The goal point is (xg, yg, zg); the AI can perform four actions: move forward 0.25 m, turn left 10 degrees, turn right 10 degrees, and declare goal reached. The setting allows a maximum of 2000 actions.

An example of the environment in which the AI will act is shown in Figure 1. The blue cube in Figure 1 is the starting point and the red cube is the goal. To check generalization ability, not memory, we evaluate whether the AI can reach the goal in an environment it has never seen before.

The AI is given the relative position to the goal (Δx, Δy, Δz) and the relative direction of the goal (Δθ). This can be said to be given a GPS and a compass. Although the mole rat is a blind animal, it is said to be able to locate itself by integrating the path it has taken and capturing the earth's magnetic field, which is like being given a GPS and a compass.

For this evaluation of action planning capability, we have set two evaluation indicators. One is "Success"; if the AI can declare reaching the goal within 0.2m of the goal, it is a success. The other is "Shortest Path Length (SPL). This is an indicator that the shorter the path to success, the more efficient the action.

AI Algorithms

AI behavior is determined based on the Long Short Term Memory (LSTM) model. Here, the goal location (xg, yg, zg), the relative position to the goal (Δx, Δy, Δz), the direction of the goal Δθ, and the index of proximity to the goal min(||(xg,yg,zg)-(Δx, Δy, Δz)||,0.5)aregiven as inputs to LSTM. Each is mapped to 32 dimensions and concatenated with the 32 dimensional embedding of the previous learned behavior, resulting in a 160 dimensional LSTM input; the output of the LSTM is fed into the full coupling layer, which outputs estimates of the distribution and value function of the behavior space. The model parameters of this LSTM are optimized using reinforcement learning PPO (Proximal Policy Optimization). The reward at this point is progress toward the goal.

Assessment Results

For comparison, the action planning algorithm Bug, inspired by insects, is an algorithm that basically goes straight in the direction of the goal and, when it hits a wall, it follows the wall.

When advancing along the wall, there is a choice between going leftward or rightward. This time, we evaluate Bug-Always Right and Bug-Always Left. Furthermore, as an ideal Bug algorithm, we evaluate the case where the Bug chooses to go left or right as appropriate to minimize the distance to the goal as a Clarivoyant Bug ( the Bug cannot learn which way to go, but we are looking at the performance if it could).

An example comparing the action paths ofBug-Always Right (black with white),Bug-Always Left (orange),Clarivoyant Bug (light blue)and AI (Agent, blue) is shown in Figure 2.

Figure 2: Differences in action paths by method

This example shows that the Clairvoyant Bug reaches the goal very efficiently, although the paths that appear striped are the paths of the Clairvoyant Bug, since the paths overlap with Bug-Alway Right and Bug-Alway Left. In the case of only leftward or only rightward ( ), some paths appear to take a detour when reaching the goal. On the other hand, the proposed AI (Agent) shows that it is able to select a relatively smooth route.

The performance compared to AI (Sighted), which has a depth sensor and can be considered sighted, is shown in Table 1.

Table 1.Comparison with AI with Sight (Sighted)

The proposed blind AI (Blind) has a higher success rate in reaching the goal (Success) and shorter path length to goal (SPL) than the sighted AI (Sighted), although it is inferior to the Clairvoyant Bug.

Utilization of long-term memory

We investigated how AI utilizes memory. Specifically, we looked at the relationship between memory length and performance metrics to determine whether it is utilizing short-term memory (information on whether the collision occurred in the most recent step) or long-term memory (information on whether the collision occurred several hundred steps ago).

Figure 3 shows the relationship between memory length and performance metrics, evaluated by crafting LSTM so that it cannot utilize information from past steps.

Figure 3: Relationship between memory length and performance metrics.

The performance of both of the two evaluation metrics SPL and Success does not saturate until the memory length is about 1000 steps. In other words, we believe that the proposed AI achieves performance improvement by leveraging long-term memory.

Expression of intelligent behavior

To investigate what kind of information it remembers, we conducted a probe experiment. In this experiment, the AI was asked to plan its actions from the start point (represented by the green sphere in Figure 4) to the goal point (represented by the red sphere), and then to aim at the goal point again from the start point while inheriting the memories of the AI that had reached the goal point.

As a result, the first path taken by the AI is the path indicated by the blue path, but when the AI tried to reach the goal from the starting point again, it took the purple path. We can see that it can take shortcuts. Although it is supposed to be a blind AI, it chose the path as if it were blind.

Map of the environment and expression of collision detection neurons

If the AI can classify the presence or absence of obstacles from its learned neurons, it can create a map of the environment (equivalent to a map because a path without obstacles is a passable path and the map here shows passable paths). The map extracted from the AI's memory by predicting the presence or absence of obstacles is shown in Figure 5.

Figure 5: Correct and predicted presence of obstacles

The two examples, A and B, are shown in Figure 5, and the Ground Truth and Prediction are generally similar.

Next, we examined how collisions are structured in the AI's internal representation (LSTM neurons): we re-trained a sparse linear classifier that classifies collision presence/absence using the AI's trained neurons as features, and extracted 10 neurons that are highly influential for collision presence/absence classification. We then lowered the dimension to a 2-dimensional feature space using t-SNE and clustered the behaviors, the results of which are shown in Figure 6.

Figure 6: Clustering of AI's internal representation of collisions

Colors indicate collision or no collision, with green indicating no collision and red indicating collision. Arrows indicate the previous action, Forward indicates forward, Turn Right indicates turn right, and Turn Left indicates turn left. Clustering resulted in the following clusters: Forward-No Collision, Forward-Collided, and Turn-No Collision (2 clusters). Numbers and corresponding images represent scenes.

The creation of a cluster of forward-forward-collision and a cluster of forward-forward-no collision suggests that neurons that detect the presence or absence of collision due to forward movement were expressed.

Selective, task-dependent maps

Given AI's limited memory, we believe that AI remembers important information and forgets unnecessary information To investigate what information AI remembers, we investigated whether AI can predict past locations from its memory. Specifically, we trained a network to predict past locations based on the current LSTM output and checked the prediction error. The higher the prediction error, the better the location can be regarded as well remembered.

Prediction errors for past locations (the smaller the error, the better the past location is recalled) are shown inFigure 7.

Figure 7.past location prediction error varies depending on what kind of path is taken(the smaller theerror, the better the past location is recalled)

The horizontal axis is the number of past steps and the vertical axis is the prediction error. The color of the line indicates the classification of the past position (what kind of path).Green (Exit) is the last 10% steps of a looping path (exit of a loop), and orange (Excursion) is alocationincluded in a looping path(classified by human observation as a path that goes around in a circle and returns to its original location). Blue (Non-Excursion) is a location included in a non-looping path.

Basically, the prediction error increases as one goes back in time, but the prediction error ogisa varied depending on the pathway. Prediction errors for locations included in looping pathways are large, while prediction errors for locations included in non-looping pathways are small.

Looping paths can be regarded as detours, and thus as having taken a detour, but it is clear that the AI forgets such paths and remembers well those that do not.

On the other hand, the paths that are part of a loop but exit the loop have a smaller prediction error, which can be interpreted as a memorized path as a landmark to avoid entering the same loop again.

Conclusion

In the paper presented here, it was shown that when a blind AI is given rewards for approaching a goal and is asked to solve a task that requires it to plan and act on a path from the start to the goal, the blind AI learns a map of its environment.

They were able to make good use of their long-term memories, choose shortcuts if they could get to the goal, remember and forget landmarks for detours, detect collisions, learn to move along walls, and showed amazing goal reaching ability without feeling blind.

Although this is not a paper proposing a new algorithm in this case, the design of the question, which is interesting to both cognitive scientists and AI researchers, the clever experiments to answer it, and the clear description of these experiments make it an excellent paper.

This paper was unique in that it was structured with a series of interesting headings, as if it were a popular article, which is not usually found in regular papers. The detailed experimental setup is specifically described in the Appendix, which clearly describes the technical details while highlighting the answers to thequestions.

ICLR, one of the top AI conferences, has a preconceived notion that understanding through mathematical theory and novel and highly effective technologies are highly valued, but seeing that scientific understanding of behavioral learning through approaches such as the paper presented here is highly valued, we may see more research using AI that promotes scientific understanding of humans and animals in the future.AI-basedresearch that facilitates scientific understanding of humans and animals may emerge in the future.