Seedream 3.0 Fill: Next-generation Mask Editing With OneReward

21/09/2025

3 main points
✔️ OneReward learns multi-task image editing in a unified way with a single VLM reward model
✔️ Seedream 3.0 Fill achieves high accuracy for image fill, expansion, removal, and text drawing with no SFT required
✔️ Experiments show better performance than commercial and OSS models, especially in expansion and removal

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning
written by Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, Xinglong Wu
(Submitted on 28 Aug 2025)
Comments: project url: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

This paper proposes OneReward, a novel reinforcement learning framework for handling multiple editing tasks in image generation in an integrated manner.

Conventional image editing models are often trained specifically for individual tasks such as inpainting (filling), outpainting (enhancing), object removal, and text rendering, and their generality is limited by differences in data distribution and evaluation criteria for each task.

In addition, conventional reinforcement learning based on human preferences (RLHF) requires different reward models for each task and evaluation dimension, which poses challenges to training efficiency and consistency.

By using VLMs as the sole reward model, OneReward in this study enables consistent evaluation in line with human preferences while distinguishing between tasks and evaluation criteria.
This enabled the integration of different tasks into one unified editing model, creating an innovative framework that combines efficiency and performance.

Furthermore, Seedream 3.0 Fill, which was developed by applying this framework, has outperformed state-of-the-art commercial and open source models.

Proposed Methodology

The central mechanism of OneReward is to handle multiple tasks and multidimensional evaluation criteria in an integrated manner, using a single VLM as the reward model.

Whereas in the past it was necessary to train separate reward models for each criterion, such as textual consistency, aesthetic quality, structural consistency, and removal quality, OneReward embeds the task ID and evaluation criteria in the evaluation query and determines which VLM is better for the input image pair.
This comparison-based design allows training to proceed while preserving inconsistencies between different evaluation dimensions.

In the learning process, the existing diffusion model is used as the reference model and the policy model is compared to images generated by partial denoising.
In doing so, the reward model determines which images are in line with human preferences with a binary "Yes/No" output, and uses this probability as a signal for reinforcement learning.

This allows the model to learn multidimensional preferences simultaneously in a multitask environment, providing a unified performance improvement without the need for additional task-specific SFT (Supervised Fine-Tuning).

Experiments

The authors compared the performance of Seedream 3.0 Fill, trained with OneReward, to state-of-the-art models such as Adobe Photoshop, Ideogram, and Flux Fill [Pro].

Evaluations were performed on four main tasks: image fill, enhancement (with/without prompts), object removal, and text rendering, and were measured in multiple dimensions, including usability rate, text consistency, structural consistency, aesthetic quality, and removal quality.
The results showed that Seedream 3.0 Fill outperformed existing methods in all tasks, especially in image enhancement (without prompts), where the Usability Rate reached 87.54%, significantly outperforming other models.

In addition, it achieved a high removal quality of 86.33% for object removal, generating the fewest unwanted objects.
In addition, in the Good-Same-Bad test based on human evaluation, the model with OneReward significantly increased the percentage of "Good" decisions compared to the base model.

These experiments demonstrated that OneReward works effectively for a variety of editing tasks with a single reward model, enabling unified and high-performance image editing.

Categories related to this article

nakata

Seedream 3.0 Fill: Next-generation Mask Editing With OneReward

Summary

Proposed Methodology

Experiments

MVTracker: A Multi-view 3D Point Tracking Method That Achieves High Accuracy With A Small Number Of Cameras

MVTracker: A Multi-view 3D Point Tracking Method That Achieves High Accuracy With A Small Number Of ...

LLM Safety Amplification Achieved By Rank 1 Update! ROSI Mechanism And Experimental Results

LLM Safety Amplification Achieved By Rank 1 Update! ROSI Mechanism And Experimental Results

LLM Learning That Combines Diversity And Task Specialization: TCIA Mechanism And Experimental Results

LLM Learning That Combines Diversity And Task Specialization: TCIA Mechanism And Experimental Result ...

Innovation In Feature Video Generation With Mixture Of Contexts! Efficient Context Preservation And High Precision Generation

Innovation In Feature Video Generation With Mixture Of Contexts! Efficient Context Preservation And ...

AWORLD: Efficient Learning Platform For Agent AI With A Distributed Framework

AWORLD: Efficient Learning Platform For Agent AI With A Distributed Framework

MCP-Bench Opens Up A New Wave Of LLM Agent Evaluation! Challenges For Complex Tasks And Real-World Scenarios

MCP-Bench Opens Up A New Wave Of LLM Agent Evaluation! Challenges For Complex Tasks And Real-World S ...