
Seed Diffusion Preview: Next-generation Code Generation Model That Combines Fast Inference And High Performance
3 main points
✔️ Proposed Seed Diffusion Preview, a fast inference-based code generation model using discrete state diffusion
✔️ Combines speed and quality with two-stage learning, generation order constraints, and block parallel generation
✔️ Achieves 2,146 tokens/sec on an H20 GPU and high performance on various code benchmarks Demonstrated high performance on a variety of code benchmarks
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
written by Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo, Zheng Li, Yuehang Yang, Hongli Yu, Xingwei Qu, Yuwei Fu, Jing Su, Ge Zhang, Wenhao Huang, Mingxuan Wang, Lin Yan, Xiaoying Jia, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Yonghui Wu, Hao Zhou
(Submitted on 4 Aug 2025)
Comments: Demo is available at this https URL Project page is this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
The images used in this article are from the paper, the introductory slides, or were created based on them.
Overview
This research proposes Seed Diffusion Preview, a fast inference model based on discrete-state diffusion (DSD), as a new approach for large-scale language modeling.
Conventional autoregressive (AR) models generate tokens sequentially, which limits the speed of inference even with high accuracy.
Diffusion models, on the other hand, can be generated in parallel, but have speed and performance issues in the natural language processing domain due to their continuous data assumption design and sequential recovery procedure.
Our method combines a learning pipeline dedicated to code generation with complex improvements such as two-stage curriculum learning, learning with constrained generation order, on-policy learning, and block-wise parallel inference.
The result is fast inference of 2,146 tokens/second on H20 GPUs while maintaining high performance on multiple code generation benchmarks, including HumanEval, LiveCodeBench, and MBXP.
This breaks the trade-off between speed and quality and demonstrates the practical feasibility of the diffusive language model.
Proposed Methodology
Seed Diffusion Preview integrates the following elements to overcome the unique challenges of natural language processing while leveraging the strengths of diffusion models.
First, it employs "Two-Stage Curriculum Learning (TSC)," initially building a robust foundation with a mask-based destruction process, and then adding an edit-based destruction process in the later stages to improve self-correction capability.
Second, "Generation Order Constraint Learning" is used to extract high-quality trajectories from a large number of candidate generation orders using the ELBO maximization criterion to suppress order variation.
Third, "on-policy learning" improves speed while optimizing the number of steps during inference.Furthermore, "block-by-block parallel generation" is employed during inference to efficiently generate tokens while maintaining causality between blocks.
In addition to these designs, internal infrastructure optimization and KV caching are combined to achieve both speed and quality.
Experiments
The model was evaluated on a variety of benchmarks in the code generation field.
Basic coding performance on HumanEval and MBPP, practical and time pollution-free competition programming performance on BigCodeBench and LiveCodeBench, and multilingual code generation performance on MBXP.
We also validated performance based on natural user queries with NaturalCodeBench.
We also evaluated the ability to modify existing code through code editing tasks such as Aider and CanItEdit.
The results showed that Seed Diffusion Preview performed as well as or better than its similarly sized predecessors (such as Mercury Coder and Gemini Diffusion) on many metrics, while achieving 2 to 3 times faster in inference speed.
In particular, the performance improvement was remarkable for the editing task, confirming the effectiveness of the diffusion-based approach for both code generation and editing.
Categories related to this article