Catch up on the latest AI articles

ChatGPT Translation Performance

ChatGPT Translation Performance

Computation And Language

3 main points
✔️ examine the translation capabilities of ChatGPT, a machine learning model.
✔️ A strategy called pivot prompting is proposed to improve ChatGPT's performance.

✔️ Improvements have been proposed by introducing the GPT-4 engine.

Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine
written by Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi, Zhaopeng Tu
(Submitted on 20 Jan 2023)
Analyzed/compared the outputs between ChatGPT and Google Translate; both automatic and human evaluation
Subjects: Computation and Language (cs.CL)


The images used in this article are from the paper, the introductory slides, or were created based on them.


This paper examines the translation capabilities of ChatGPT, a machine learning model. The evaluation of translation performance focuses on the effect of translation prompts, the execution of multilingual translations, and translation robustness (i.e., performance under different circumstances). The quality of the translation is affected by the prompts.

Evaluated on various benchmarks, we found that ChatGPT performs nearly as well as commercial translation tools in some languages, but there is still room for improvement in other languages and in remote languages. In particular, the results were quite good for spoken language, such as medical summaries and online community comments.

In addition, a strategy called pivot prompting has been proposed to improve ChatGPT's performance. This is a method of translating a text into a high-resource intermediate language prior to translation; with the introduction of the GPT-4 engine, ChatGPT's translation capabilities have been greatly enhanced to provide the same quality as commercial translation tools, even for distant languages.

Finally, ChatGPT with GPT-4 has been shown to produce more reliable translations with fewer errors than previous versions. In short, ChatGPT is increasingly establishing itself as a superior translator.


ChatGPT1 is a chat machine developed by OpenAI. It is trained on InstructGPT (a natural language processing model provided by OpenAI) and is designed to provide detailed responses to prompt instructions.ChatGPT is interactive and can answer questions, admit mistakes, reject inappropriate requests The system can answer questions, admit mistakes, and deny inappropriate requests. The system integrates a variety of natural language processing capabilities, including question answering, storytelling, logical reasoning, code debugging, and machine translation.

Translation prompts provide guidance for the translation model to start translation and may affect translation quality. For multilingual translations, we will evaluate ChatGPT's performance with different language pairs, taking into account differences in resources and language families.

Finally, ChatGPT's translation capabilities will be re-evaluated when using the improved engine, GPT-4, and it will be shown that the use of GPT-4 significantly improves ChatGPT's translation performance. This will enable ChatGPT to provide the same quality as commercial translation products.

Valuation index

The evaluation settings provided focus primarily on comparing ChatGPT with other translation products. Different evaluation settings will be used to assess the ability of these products to translate languages. These evaluation settings primarily include the baseline against which they are compared and the test data. The baseline is usually a standard model or system against which the translation products are compared. In this case, ChatGPT is compared to Google Translate, DeepL Translate, and Tencent TranSmart. These commercial systems support translations in 133, 29, and 16 languages, respectively.

The test data includes the Flores101 multilingual translation set and the WMT19 Biomedical Translation Task (Bio) and WMT20 Robustness Task (Rob2 and Rob3) sets.The Bio test set consists of summaries of medical documents, while the Rob2 set is extracted from social media comments. These test sets will be used as a basis for evaluating the translation capabilities of ChatGPT and other translation products.

The main evaluation metric is the BLEU score, which measures how well the translation result matches the correct answer and is expressed on a scale of 0 to 100. Other indicators such as ChrF++ (Character F-score) and TER (Translation Edit Rate) may also be reported.

Prompts are also designed to trigger ChatGPT's translation function. These prompts are used to request ChatGPT to perform a translation. To test the translation performance of different languages, prompts are provided in different language pairs.

This evaluation setting provides a more comprehensive assessment of ChatGPT's translation capabilities and allows for comparison with other translation systems.

ChatGPT for machine translation (MT)

With regard to improving ChatGPT for MT, three major improvements have been proposed. First, a method was proposed to improve translation quality between different languages using a strategy called pivot prompting. This is a method that involves translation from the source language to the pivot language (usually English) and finally to the target language. This reduces the challenges caused by differences in resources between language pairs.

Next, improvements were proposed with the introduction of the GPT-4 engine, which is more powerful than the GPT-3.5 model behind ChatGPT and shows significant performance gains in all four translation directions.

In addition, automatic and manual analysis of the translation output was performed to compare the translation results between ChatGPT and GPT-4. The results show that GPT-4 has less over-translation and mistranslation errors, and overall superior translation performance.

With these improvements, ChatGPT for MT may provide a higher level of translation and help facilitate communication between different languages. In the following table, comparisons are made with commercial systems and with GPT4. (De:German, En:English, Zh:Chinese, Ro:Romanian)


In this study, ChatGPT's machine translation capabilities were tested. Results showed that while ChatGPT could compete with commercial products in major European languages, it was inferior in languages with fewer resources and in distant languages. However, the introduction of a new strategy called pivot prompting showed the potential to improve translation of distant languages. Furthermore, the introduction of the GPT-4 engine has significantly improved ChatGPT's translation performance to a level comparable to commercial products. Thus, ChatGPT is established as an excellent translator.

The results of this study show that ChatGPT has made great strides in machine translation, but there is still room for improvement. Further improvements in translation quality and support for a wider range of languages are desirable, and ChatGPT is still evolving and will establish itself as an even better translator in the future.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us