Exploring LLM's Persuasion Resistance And Flexibility! New Evaluation And Training Methods With DuET-PD And Holistic DPO

22/09/2025

3 main points
✔️ LLMs face the dual challenges of being easily fooled by misinformation and rejectingcorrectivecorrections
✔️ DuET-PD systematically evaluates LLM position changes under positive and negative persuasion in the knowledge and safety domains
✔️ Holistic DPO combines misinformation tolerance and correction acceptability, improving model reliability Significantly improved model reliability

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD
written by Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee
(Submitted on 24 Aug 2025 (v1), last revised 9 Sep 2025 (this version, v3))
Comments: To appear at EMNLP 2025
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Overview

This paper is a study that focuses on the position changes that LLMs exhibit in persuasive dialogue and systematically examines their robustness and adaptability.

In high-risk domains such as healthcare and finance, the ability to flexibly respond to corrective actions while not being swayed by incorrect persuasion is essential.
Existing LLMs, however, have the conflicting problems of "light trust," which makes them easily deceived by misinformation, and "stubbornness," which makes them refuse to make the correct corrections.

The authors proposed DuET-PD (Dual Evaluation for Trust in Persuasive Dialogues) to address this issue and conducted a multi-turn dialogue experiment in the knowledge (MMLU-Pro) and safety (SALAD-Bench) domains.
Furthermore, we showed that existing training methods are insufficient and proposed a new learning method called "Holistic DPO," which aims to achieve both correct correction and tolerance of misinformation.

Proposed Method

The authors first designed an evaluation framework called DuET-PD.

It consists of three stages: (1) measuring initial response accuracy, (2) giving "negative persuasion (NEG)" due to misinformation if the answer was correct, and "positive persuasion (POS)" for correction if the answer was incorrect, and (3) checking the position again after each turn.

Seven types of persuasion are provided: "logical appeal," "evidence-based appeal," "expert citation," "authority citation," "emotional appeal (positive/negative)," and simple repetition, tracking position changes over multiple turns.
This framework enables the simultaneous quantification of the model's "robustness" (ability to reject misinformation) and "acceptability" (ability to accept corrective actions).

As a further improvement, a learning method called "Holistic DPO" was proposed.
This approach uses training data that contains a good balance of samples that reject misinformation and samples that accept correct corrections, and emphasizes the equilibrium between the two rather than just resistance-reinforcement type training.

Experiment

In the experiment, we used a total of 2,246 questions from MMLU-Pro and SALAD-Bench, and repeated three turns of persuasion dialogues on nine different models, including GPT-4o and Llama-3.1-8B.

The results showed that even the latest high-performance models were vulnerable to misinformation in the knowledge domain, and even GPT-4o's retention of correct answers dropped to 27.32% after three turns.
On the other hand, the smaller open source models were found to be more flexible in accepting corrections, but also extremely vulnerable to misinformation.

We also confirmed that simple repetition alone has a high persuasive effect and that the tendency to pander (sycophancy) is stronger with newer open source models.
The Holistic DPO, which was tested as an improvement measure, significantly increased misinformation resistance from 4.21% to 76.54% in the SALAD-Bench, while maintaining more than 70% acceptability of correct corrections.

This result was evaluated as more practical than a mere resistance enhancement type, and was shown to contribute significantly to improving reliability.

Categories related to this article

nakata

Exploring LLM's Persuasion Resistance And Flexibility! New Evaluation And Training Methods With DuET-PD And Holistic DPO

Overview

Proposed Method

Experiment

MMR1: A Multimodal Inference Model That Stabilizes Reinforcement Learning With Sampling Based On Reward Variance

MMR1: A Multimodal Inference Model That Stabilizes Reinforcement Learning With Sampling Based On Rew ...

VCRL: A New Approach To LLM Reinforcement Learning That Controls Learning Difficulty With Reward Variance

VCRL: A New Approach To LLM Reinforcement Learning That Controls Learning Difficulty With Reward Var ...

The Challenge Of Social-MAE, A Social AI That Uses Self-supervised Learning To Decipher Emotions, Laughter, And Personality

The Challenge Of Social-MAE, A Social AI That Uses Self-supervised Learning To Decipher Emotions, La ...

OnGoal: New Chat Interface To Visualize The Goals Of LLM Dialogue

OnGoal: New Chat Interface To Visualize The Goals Of LLM Dialogue

TriMM: Collaborative Multimodal Coding For High-quality 3D Generation

TriMM: Collaborative Multimodal Coding For High-quality 3D Generation

Dress&Dance: Video Diffusion Model For Highly Accurate Virtual Fitting And Motion Generation

Dress&Dance: Video Diffusion Model For Highly Accurate Virtual Fitting And Motion Generation