Catch up on the latest AI articles

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robots

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robots

3 main points
✔️ RoboTwin 2.0 is a framework for automatic generation of high-quality synthetic data for two-armed robot manipulation
✔️ Generates and modifies expert-level manipulation code using a closed-loop approach combining MLLM and simulation
✔️ Domain randomization and diverse robot support for real-world environments High generalization performance and robustness in real-world environments

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
written by Tianxing ChenZanxin ChenBaijun ChenZijian CaiYibin LiuQiwei LiangZixuan LiXianliang LinYiheng GeZhenyu GuWeiliang DengYubin GuoTian NianXuanbing XieQiangyu ChenKailun SuTianling XuGuodong LiuMengkang HuHuan-ang GaoKaixuan WangZhixuan LiangYusen QinXiaokang YangPing LuoYao Mu
(Submitted on 22 Jun 2025)
Comments: Project Page: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Overview

Two-armed manipulation by robots is essential for complex real-world tasks such as assembly, tool use, and object delivery. However, large-scale data collection in real-world environments is difficult in terms of both time and cost, and the generalizability of learned manipulation strategies is limited. To solve this problem, a large-scale, high-diversity data generation and benchmarking framework called "RoboTwin 2.0" was proposed in this study.

RoboTwin 2.0 uses a closed-loop approach that utilizes a multimodal language model (MLLM) to automatically generate robot operation programs, which are then modified and enhanced through simulations. Furthermore, by introducing powerful domain randomization across five axes, including background, lighting, object placement, and instructional text, the system greatly improves visual, physical, and linguistic diversity and robustness in a real-world environment.

The system supports 731 object types and 50 different two-armed tasks, with over 100,000 pre-collected specialized trajectory data. Experiments have demonstrated improved code generation accuracy, adaptation to different robot arms, and the ability to generalize to real environments with zero shots.

Proposed Methodology

RoboTwin 2.0 is a scalable framework for automatic generation of high-quality, two-armed robot manipulation data. The methodology consists of three main components: (1) a multimodal code generation agent, (2) domain randomization, and (3) robot arm-specific adaptive modules.

First, MLLM automatically generates initial code in response to task instructions written in natural language. This code is executed 10 times each in the simulation environment, and a vision-language model (VLM) analyzes the behavior log and causes of failure. Based on this, the code is iteratively modified and improved until a success rate of at least 50% is achieved.

Next, domain randomization introduces diversity in object placement, background texture, lighting, table height, and directives. This allows the model to gain robustness to a variety of visual and physical environments.

Furthermore, to accommodate five different robot types (Franka, UR5, etc.), the design also incorporates the preparation of a variety of grasping candidates for each object and the adaptive generation of grasping motions based on the robot's degree of freedom.

Experiments

In this study, the effectiveness of RoboTwin 2.0 was demonstrated from multiple perspectives. First, the success rate of automatic code generation was compared with the conventional method (RoboTwin 1.0) in 10 different tasks. The results showed that incorporating feedback using visual and verbal information significantly improved the success rate, reaching a maximum of 71.3%.

Next, we tested the difference in robustness with and without domain randomization, and found that models trained on RoboTwin 2.0 showed an improvement in success rate of more than 20%, even in unseen environments. Zero-shot validation for four tasks in a realistic environment also showed a success rate improvement of more than 20% under unseen backgrounds and cluttered scenes.

Furthermore, models trained on RoboTwin 2.0 had the highest success rate in the "hard setting" (cluttered environment) of the RoboTwin benchmark, clearly demonstrating the difference between RoboTwin 2.0 and other methods. These results demonstrate that RoboTwin 2.0 is a versatile and practical data generation platform that greatly enhances its ability to generalize to real-world environments.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us