Catch up on the latest AI articles

LLM Learning That Combines Diversity And Task Specialization: TCIA Mechanism And Experimental Results

LLM Learning That Combines Diversity And Task Specialization: TCIA Mechanism And Experimental Results

3 main points
✔️ TCIA is an instruction extension framework that combines versatility and task conformance
✔️ Decompose instructions into base queries and constraints and generate diverse instructions with BFS search
✔️ Experiments show an average performance improvement of 8.7%, also outperforming GPT-4o

TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning
written by Simin MaShujian LiuJun TanYebowen HuSong WangSathish Reddy IndurthiSanqiang ZhaoLiwei WuJianbing HanKaiqiang Song
(Submitted on 28 Aug 2025)
Comments: Published on arxiv.
Subjects: Artificial Intelligence (cs.AI)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

This paper proposes a task-centric instruction data augmentation method called TCIA (Task-Centric Instruction Augmentation) for LLM fine tuning, which is in line with real-world applications.

Conventional methods have tried to ensure diversity through self-generated instruction data augmentation, but there are problems with repetitive instructions and "task drift," which is a deviation from the target task.
In the real world, there are many situations where performance specialized for a specific task is required rather than a general-purpose model, so a mechanism to maintain task conformance as well as diversity is essential.

TCIA is a method that decomposes instructions given in natural language as a combination of "underlying questions" and "constraints," and expands the instructions widely while manipulating the constraints.
Experiments have shown that TCIA achieves an average performance improvement of 8.7% on practical tasks such as meeting summarization, exceeding GPT-4o in some cases.
In this way, TCIA provides a new framework for LLM tuning that is robust to realistic applications.

Proposed Methodology

TCIA is a systematic instruction expansion framework consisting of six steps.

First, the semantic structure of instructions is clarified by decomposing natural language instructions into "base queries" and "constraints".
Next, a diverse database of constraints constructed from public datasets (e.g., Tulu-3) is used to enable retrieval of constraints related to similar tasks.

Then, using breadth-first search (BFS), operations such as "add," "delete," and "replace" are repeated to generate a diverse and task-compatible set of constraints.
The generated instructions are again converted into natural language, and are high-qualityed through verification of missing constraints and resolution of inconsistencies.

Furthermore, only optimal instruction-response pairs are selected through response generation using multiple LLMs and screening by LLMs (5-dimensional evaluation of quality, usefulness, accuracy, consistency, etc.).
The result is a large training dataset that is faithful to the task and maintains diversity, enabling efficient and realistic fine tuning.

Experiments

The authors test the effectiveness of TCIA at both the instruction and model levels.

First, by comparing TCIA with conventional methods such as WizardLM, the authors show that TCIA maintains a high level of task conformance while preserving instructional diversity.
For example, even after three expansions, TCIA maintained a task conformance rate of almost 100% and outperformed WizardLM in the diversity metric.

Next, based on Llama-3.1-8B, we performed fine tuning on four practical tasks, such as meeting summarization and information extraction, and found an average performance improvement of 8.7%.
It is particularly noteworthy that the results outperformed GPT-4o.

In addition, experiments on adaptation to new constraints confirmed that models trained on TCIA are flexible enough to handle unseen requirements, such as changing from bulleted to numbered lists and output length restrictions.
Furthermore, the models maintained good scores in public benchmarks such as MMLU-Pro and GPQA, demonstrating both task-specific and general-purpose performance.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us