RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robots

29/07/2025

3 main points
✔️ RoboTwin 2.0 is a framework for automatic generation of high-quality synthetic data for two-armed robot manipulation
✔️ Generates and modifies expert-level manipulation code using a closed-loop approach combining MLLM and simulation
✔️ Domain randomization and diverse robot support for real-world environments High generalization performance and robustness in real-world environments

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
written by Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo, Yao Mu
(Submitted on 22 Jun 2025)
Comments: Project Page: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Overview

Two-armed manipulation by robots is essential for complex real-world tasks such as assembly, tool use, and object delivery. However, large-scale data collection in real-world environments is difficult in terms of both time and cost, and the generalizability of learned manipulation strategies is limited. To solve this problem, a large-scale, high-diversity data generation and benchmarking framework called "RoboTwin 2.0" was proposed in this study.

RoboTwin 2.0 uses a closed-loop approach that utilizes a multimodal language model (MLLM) to automatically generate robot operation programs, which are then modified and enhanced through simulations. Furthermore, by introducing powerful domain randomization across five axes, including background, lighting, object placement, and instructional text, the system greatly improves visual, physical, and linguistic diversity and robustness in a real-world environment.

The system supports 731 object types and 50 different two-armed tasks, with over 100,000 pre-collected specialized trajectory data. Experiments have demonstrated improved code generation accuracy, adaptation to different robot arms, and the ability to generalize to real environments with zero shots.

Proposed Methodology

RoboTwin 2.0 is a scalable framework for automatic generation of high-quality, two-armed robot manipulation data. The methodology consists of three main components: (1) a multimodal code generation agent, (2) domain randomization, and (3) robot arm-specific adaptive modules.

First, MLLM automatically generates initial code in response to task instructions written in natural language. This code is executed 10 times each in the simulation environment, and a vision-language model (VLM) analyzes the behavior log and causes of failure. Based on this, the code is iteratively modified and improved until a success rate of at least 50% is achieved.

Next, domain randomization introduces diversity in object placement, background texture, lighting, table height, and directives. This allows the model to gain robustness to a variety of visual and physical environments.

Furthermore, to accommodate five different robot types (Franka, UR5, etc.), the design also incorporates the preparation of a variety of grasping candidates for each object and the adaptive generation of grasping motions based on the robot's degree of freedom.

Experiments

In this study, the effectiveness of RoboTwin 2.0 was demonstrated from multiple perspectives. First, the success rate of automatic code generation was compared with the conventional method (RoboTwin 1.0) in 10 different tasks. The results showed that incorporating feedback using visual and verbal information significantly improved the success rate, reaching a maximum of 71.3%.

Next, we tested the difference in robustness with and without domain randomization, and found that models trained on RoboTwin 2.0 showed an improvement in success rate of more than 20%, even in unseen environments. Zero-shot validation for four tasks in a realistic environment also showed a success rate improvement of more than 20% under unseen backgrounds and cluttered scenes.

Furthermore, models trained on RoboTwin 2.0 had the highest success rate in the "hard setting" (cluttered environment) of the RoboTwin benchmark, clearly demonstrating the difference between RoboTwin 2.0 and other methods. These results demonstrate that RoboTwin 2.0 is a versatile and practical data generation platform that greatly enhances its ability to generalize to real-world environments.

Categories related to this article

nakata

RoboTwin 2.0: Scalable Synthetic Data Generation And Benchmark Design For Dual-Arm Manipulation Robots

Overview

Proposed Methodology

Experiments

Combining Speed And Accuracy: Quantization-aware LLM Pre-training "QAP

Combining Speed And Accuracy: Quantization-aware LLM Pre-training "QAP

HiWave: Innovation In Wavelet Diffusion Generation For 4K Images Without Additional Learning

HiWave: Innovation In Wavelet Diffusion Generation For 4K Images Without Additional Learning

Forget-Me-Not: A Proposal For A Simple Prompting Technique To Prevent Forgetting Information In Long Prompts

Forget-Me-Not: A Proposal For A Simple Prompting Technique To Prevent Forgetting Information In Long ...

Potential Of The Conversation Optimization Tokenizer: A Method To Improve LLM Inference Efficiency By 10%

Potential Of The Conversation Optimization Tokenizer: A Method To Improve LLM Inference Efficiency B ...

Enhanced LLM Code Generation With Property-based Testing! New Framework PGS To Break Self-Deception

Enhanced LLM Code Generation With Property-based Testing! New Framework PGS To Break Self-Deception

Evolution Of Llama To Support Reinforcement Learning, OctoThinker Shows The Power Of Intermediate Learning

Evolution Of Llama To Support Reinforcement Learning, OctoThinker Shows The Power Of Intermediate Le ...