RLV: A Framework For Using Video Of People Solving Tasks For Reinforcement Learning
3 main points
✔️ The proposed RLV to use Offline Observations for reinforcement learning
✔️ The proposed model to resolve domain shifts in Observation Data and Interaction Data
✔️ Showing higher sample efficiency results than normal reinforcement learning
Reinforcement Learning with Videos: Combining Offline Observations with Interaction
written by Karl Schmeckpeper, Oleh Rybkin, Kostas Daniilidis, Sergey Levine, Chelsea Finn
(Submitted on 12 Nov 2020)
Comments: Accepted at CoRL2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Introduction
In this article, we present the paper "Reinforcement Learning with Videos: Combining Offline Observations with Interactions" presented at CoRL 2020. Although reinforcement learning has been effective in recent years in learning a variety of robotic tasks, it is still necessary to gather a variety of experiences to improve general performance. However, obtaining such a variety of experiences is difficult, as evidenced by the poor sample efficiency that is a problem with reinforcement learning. In contrast, it is very easy to collect data on humans solving a task. In this study, we focused on whether offline observation (video) of a human solving a task can help reinforcement learning learn a robot's policy more efficiently. We refer to data for which the label's action and reward do not exist. On the other hand, we define (online) interaction data as data obtained when a robot performs an action on the environment. On top of that. There are two main challenges to making this possible.
1. Allow the robot to update its policy (policy) by using observation data.
2. to be able to cope with domain shifts caused by differences in action space, agent forms, data perspectives, and environments in interaction and observation data
As a matter of fact, there are already methods for learning policies by using observation data. However, there is a possibility that the observation data does not solve the task optimally, and as a result, the learned policy will not be optimal. And even if we could collect perfect observation data, it is said that we are not able to learn a good policy. However, since reinforcement learning can learn policies using both successful and unsuccessful trajectories, it is thought to be a more effective way to use observation data. For these reasons, this study aimed to use observation data for reinforcement learning.
The proposed approach will now be explained in the next section.
To read more,
Please register with AI-SCHOLAR.
ORCategories related to this article