Simulation X Reinforcement Learning! From Simulation To Real Automated Driving Systems (Part 1)

Self-Driving 06/10/2020

3 main points
✔️ Simulation x Reinforcement learning
✔️ Real-world self-driving tests
✔️ Sim-to-real strategy transition was achieved.

Simulation-Based Reinforcement Learning for Real-World Autonomous Driving
written by Błażej Osiński, Adam Jakubowski, Piotr Miłoś, Paweł Zięcina, Christopher Galias, Silviu Homoceanu, Henryk Michalewski
(Submitted on 29 Nov 2019 (v1), last revised 4 Mar 2020 (this version, v3))
Comments: Accepted at 2020 IEEE International Conference on Robotics and Automation (ICRA)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Paper Official Code COMM Code

Introduction

This paper introduces a paper on acquiring a real-life automated driving system by reinforcement learning in simulation. Inputs are RGB images from a camera and semantic segmentation to train a model that outputs steering (steering wheel operation). In our experiments, we have successfully transferred the Sim-to-Real driving system. In the first half of this article, we introduce the proposed method. The second half of the article will introduce the experiment and analysis. Let's take a look at it.

Motivation

The motivation behind our research is the effective use of simulation for the construction of real-life automated driving systems. Simulations allow us to perform dangerous actions that cannot be performed in real life, and are easier to collect data from, compared to real life. Reinforcement learning enables exploration beyond the scope of manual operation and provides end-to-end learning, which is very popular today. In automatic operation, end-to-end learning can solve the accumulation of errors* that occur when modules are combined, when learning each module separately. Another reason is that simply designing each module with manual rules is not practical because the amount of work is too much.

The focus of this paper is on the use of simulation data in the construction of a real vehicle driving system: the model was trained using 100 years of simulation driving data.

The accumulation of error means that the error of module 1 affects the operation of module 2, and the error of module 2 becomes even larger, and the increased error of module 2 affects the operation of module 3, and so on. The error of module 1 affects the operation of module 2, and the error of module 2 becomes larger, and the larger error of module 2 affects the operation of module 3, and so on.