[DetectGPT] Is The Author Of The Text AI Or Human? A Proposal On How To Tell The Difference

Zero Shot 23/01/2024

3 main points
✔️ Language models are highly evolved and are widely used in schools and newspapers.
✔️ This study proposed a new method for distinguishing whether a sentence was written by a machine or a human and showed that it performed quite well.
✔️ The confidence level of the model and the impact of minor changes are important for detection, and further improvements are expected in future research.

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
written by Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn
(Submitted on 26 Jan 2023 (v1), last revised 23 Jul 2023 (this version, v2))
Comments: ICML 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

This paper focuses on a new method for detecting sentences generated by a very large language model (LLM). Typically, the sentences generated by this model follow a particular form of the probability function. In particular, it is noted that the sentences generated by the model tend to be concentrated in what is called the "negative curvature region" of the model's probability function.

Therefore, the researchers propose a new detection method, DetectGPT, which uses this feature DetectGPT sets up criteria for detecting model-generated sentences, but what makes it so interesting is that DetectGPT uses the probabilities calculated by the model and other It only looks at the random changes made from the language model and thereby detects the sentences.

Briefly, DetectGPT is a very useful method for detecting model-generated text. And they report that DetectGPT outperforms previous methods and is particularly promising for finding fake news generated by large models.

In the figure above, trends in log probability curvature are compared for machine-generated text x _fake~pθ (x) (left) and human-written text x ^real~p _human (x) (right). Machine-generated texts usually lie in the negative curvature region with low log probability, and nearby samples also exhibit low probability on average. Human-written text, on the other hand, does not occupy a significant region of negative curvature, and nearby samples may have variable high or low probabilities.

Introduction

As outlined above, this paper describes DetectGPT, a new method for detecting text produced by large-scale language models (LLMs).LLMs can produce fluent and persuasive responses, but it has been shown that their products are sometimes incorrect. This has caused problems when LLMs replace human labor in student essay writing and journalism.

DetectGPT is a novel zero-shot method for detecting LLM-generated text, specifically based on the hypothesis that model-generated text tends to reside in the negative curvature region of the log probability function. The method does not require training on a separate model or dataset and has shown higher accuracy than existing methods in detecting machine-generated text.

The main contributions of the paper are the identification and testing of the hypothesis that the curvature of the model log probability function is more significantly negative in the model sample and the proposal of DetectGPT. This method effectively detects model-generated text using traces of the Hesse matrix of the log probability function.

In Figure 1, the procedure for determining whether DetectGPT was generated by a particular large-scale language model (LLM) (e.g., GPT-3) is illustrated. First, to evaluate a candidate sentence x, DetectGPT uses a general pre-trained model such as T5 to generate a perturbation x~i with minor modifications to that sentence. It then compares the log probability in the LLM of the original sentence x with the log probability of each perturbation sample x~i. If the average log ratio is higher, then the sample was most likely generated from the source model.

Related Research

This paper explores how, as large language models (LLMs) continue to advance, methods for identifying machine-generated text become increasingly important as their performance improves. Previous research has attempted to find text generated by a particular model, but has also identified the problem that it is strongly biased toward the original data and model.

Several methods have been proposed, but the DetectGPT presented in this study is a new method in which the model-generated text is detected at a specific part of the log probability function. The "zero shot" here refers to the fact that no prior training is required for the new task. This allows us to find machine-generated text without relying on previously trained models or datasets.

Previous related research has shown that DetectGPT is more accurate than other methods. The research focuses on advances and challenges in machine-generated text detection, and it is noted that based on the performance of DetectGPT, even better detection methods are needed in future research.

DetectGPT

DetectGPT is a method for finding out if a text was generated from a particular machine model. For example, you can use it when you want to know if a certain text was written by an AI, and DetectGPT proposes new rules and methods to do that detection. The method is designed to be used in a variety of situations, and in particular, it does not require prior knowledge of the model from which the sentence came. Therefore, it can be used for new models and domains.

DetectGPT is a method for detecting differences between machine-generated text and human text utilizing the perturbation discrepancy gap hypothesis. The approach assumes that model-generated text generally has different characteristics than human ones; DetectGPT evaluates differences between model- and human-generated text with random modifications, and its central concept is the use of perturbation mismatch gaps. It numerically evaluates the perturbation discrepancy between model-generated and human-generated text based on the perturbation discrepancy gap hypothesis. The method exploits variations in log probability to detect whether machine-generated text was generated by the model; DetectGPT combines these principles and has been validated through experiments for its effectiveness as a detection method.

Experiments on the distribution of perturbation discrepancies clearly show a difference between human-written articles and model samples, with a trend toward greater discrepancies in the model samples DetectGPT uses this difference to detect machine-generated text.

Figure 3 compares the perturbed discrepancy in log probability between machine-generated text and human-written text. The average decrease in log probability after paraphrasing is shown, with the machine-generated text showing consistently higher discrepancies. Each plot shows the perturbation discrepancy between human-written and machine-generated articles of the same word length in the XSum dataset, generated with different large-scale models (GPT-2, GPT-Neo-2.7B, GPT-J, GPT-NeoX). Discrepancies are estimated for the T5-3B model sample.

Experiment

As mentioned above, DetectGPT is a method of finding machine-generated text using a specially trained model. The method uses a zero-shot approach and can handle new situations for which the model has never been trained. Experiments show that DetectGPT performs better than other methods, especially in the context of XSum stories and SQuAD; DetectGPT is more widely applicable than supervised methods and can handle different languages and topics.

DetectGPT can robustly find machine-generated text even when it has been significantly modified, and it can be used with a variety of decoding strategies. DetectGPT also has high detection performance even when the source model is unknown. Experiments have shown that the size of the model or mask-filling model affects the performance improvement of DetectGPT, and the number of perturbations is also relevant. The nature of the data and the length of the text also affect detection, suggesting that appropriate thresholds are particularly important.

This study shows that DetectGPT is adaptable to different situations and models and is a promising method for detecting machine-generated text.

In Figure 4, a supervised machine-generated text detection model trained on a large dataset of actual and generated text shows equal or better performance than DetectGPT on existing text. However, the zero-shot approach is shown to be immediately usable in new domains (bottom row), for example, for medical texts in PubMed and German news data in WMT16. In these domains, the supervised detectors may not function accurately due to distribution shifts.

In Figure 5, we evaluate the impact of editing on machine-generated text by randomly masking a portion of a model sample of T5-3B-generated text and simulating human editing of that portion. The results show that DetectGPT consistently provides the most accurate detection while the performance of the other methods degrades as editing increases. This experiment uses the XSum dataset.

Figure 6 shows that DetectGPT performs best when scoring samples with the same model. On the other hand, the column averages suggest that some models (GPT-Neo, GPT2) may be better "scorers" than others (GPT-J). White values represent mean AUROC (standard error) for XSum, SQuAD, and WritingPrompts; black represents row/column means.

In Figure 7, there is a clear relationship between the capacity of the mask-filling model and detection performance across the entire scale of the source model. Random mask-filling has poor performance, suggesting the need for the perturbation function to generate samples on the data manifold. The curves represent AUROC scores in 200 SQuAD contexts.

Figure 8 shows the effect of the number of perturbations used by DetectGPT on the perturbation mismatch at detection. GPT-2 on the left and GPT-J on the right show the AUROC for changes in the number of perturbations used by DetectGPT. It is observed that averaging up to 100 perturbations significantly improves the reliability of DetectGPT. These perturbations were sampled from T5-large.

Conclusion

As larger language models evolve, these are increasingly being used in education, journalism, the arts, and other fields. However, their use requires tools to confirm their reliability, especially the accuracy of information and the smoothness of the text. This study focused on zero-shot machine-generated text detection and proposed a method that leverages raw probability data to evaluate model-generated text. Experimental results showed better performance than existing detection methods. It was also found that detection is affected by the quality of the model's log probability function and perturbation function, and improving these factors is a direction for future research. Ultimately, it is hoped that the results will help to find effective methods to mitigate the potential hazards posed by machine-generated media.