RakutenAI-7B" Pioneers The Frontiers Of Large-scale Language Models Specialized For Japanese

Large Language Models 27/06/2024

3 main points
✔️ Japanese-specific, based on Mistral architecture, with expanded vocabulary and highly accurate tokenization
✔️Demonstrates better performance thanother models in Japanese and English tests using LM-Harness
✔️ Released under Apache 2.0 license, but beware of bias and inaccurate output when using it

RakutenAI-7B: Extending Large Language Models for Japanese
written by Rakuten Group Inc., Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav, Ting Cai, Wei-Te Chen, Yandi Xia, Yuki Nakayama, Yutaka Higashiyama
(Submitted on 21 Mar 2024)
Comments: Published on arxiv.
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

The field of natural language processing (NLP) has evolved significantly with the latest "pre-learning, prompting, and predicting" paradigm. This new trendhas accelerated the development of large-scale language models (LLMs) that provide high-performance solutions tomanynatural language processingtasks. However, while these have been extensively studied in English, efforts to address other languages, including Japanese, have been insufficient. Therefore, the RakutenAI-7B was developed to fill this gap.

RakutenAI-7B is a specialized language model for Japanese language understanding and is the state-of-the-art innatural languageprocessing forJapanese. Based on the latest Mistral model architecture, it effectively reuses pre-trained model weights and outperforms other models in Japanese language understanding. The model achieves the highest score in the Japanese comprehension benchmark compared to comparable models such as OpenCalm, Elyza, Youri, Nekomata, and Swallow, while maintaining competitive performance in English.

In the development of RakutenAI-7B, the Mistral vocabulary was expanded from 32k to 48k in order to improve the accuracy of Japanese tokenization, thereby allowing more information to be represented with fewer tokens. The goal of this paper is to provide a more affordable and efficient Japanese model that can be applied to a wide variety of applications. The model is published under the Apache 2.0 license and is freely accessible and usable by anyone (https://huggingface.co/ Rakuten/RakutenAI-7B).

This articleintroduces thedevelopment background of RakutenAI-7B, which advances Japanese text processing, and its features.

Overview of RakutenAI-7B

This section provides an overview ofRakutenAI-7B, which employs a Mistral tokenizer, which may convert a single Japanese character into multiple tokens. This method has two challenges: it limits the amount of Japanese text that can be processed and increases the computational cost required to generate it. This is related to the complexity of kanji. Therefore, to solve this problem, RakutenAI-7B introduces 16k additional tokens, expanding the total number of tokens to 48k. This improvement makes Japanese text processing more efficient.

The quality of pre-trained data is also critical in improving the performance of large-scale language models. In this paper, we develop a data filtering technique to improve the quality of Internet-scale data sets. This technique trains models on filtered data of approximately 175 billion tokens, resulting in more relevant output.

In addition, RakutenAI-7B has developed RakutenAI-7B-instruct and RakutenAI-7B-chat through guide-based fine tuning. This allows the models to follow the guide more precisely and improves their ability to generate natural conversations. Additional tuning has been done to ensure safety and to inhibit the generation of inappropriate content. However, due attention must be paid to the possibility of unintended behavior, and constant monitoring of the model's performance and adherence to ethical and social standards is required.

Performance evaluation of RakutenAI-7B

To evaluate the performance of the RakutenAI-7B, we use an evaluation harness (LM-Harness) for the Japanese and English versions of the language model. This allows for a fair comparison of the true power of the models. The Japanese natural language processing tasks used in the evaluationranged from common sense questions to mathematical problems, and the English natural language processingtasks ranged from scientific questions to the ability to detect online falsehoods.

For the Japanese task, we use JCommonSenseQA and JGLUE to validate common sense understanding of models and inference ability. We also validatetext classification and reading comprehensionabilitiesthroughMARC-ja (a Japanese subset of the Multilingual Amazon Reviews Corpus) and JSQuAD (Japan Stanford Question Answering Dataset). In addition , the ability to answer open-domain questions and summarize news articles has been valid ated through JAQKET (Japanease Questions on Knowladge of Entities) and XLSUM-ja ( a Japanese subset of XLSUM ) , as well as through xWino ( a Japanese subset of xWinograd ) and MGSM (a Japanese subset of MGSM). subset) and MGSM (MultilingualGrade School M ath) validate the ability to resolve language ambiguity and solve complex mathematical problems.

The English task also uses ARC (AI2 Reasoning Challenge) , HellaSwag, MMLU (Massive Multitask Language Understanding), and TruthfulQA to assess the model's logical thinking, reasoning, and assessing the ability to judge truthfulness.

As an evaluation technique, in multiple choice questions, the most likely choice is chosen as the answer. In a question-answer task, the model's output is checked to see if it accurately matches the reference answer, and its accuracy is measured. This process is essential to determine how well the model can generate human-like accurate answers. Themetrics used for each task and the number of shots in the case of n-shot training are also specified, with complexity-based accuracy (acc), exact match metric (em), and ROUGE-2 score (rouge-2) as the evaluation criteria.

Through this evaluation, we are revealing how well the RakutenAI-7B performs on both Japanese and English NLP tasks.

The twotables below show RakutenAI-7B' s performance in Japanese and English LM-Harness, illustrating howRakutenAI-7Boutperforms other models. on the Japanese and English test sets. In particular, the average score of 62.83 for Japanese is more than 3 points higher than the next best model.

In English, it also outperforms its competitors by a wide margin, recording an average score of 60.50. This consistent high performance demonstrates the balanced strength of the RakutenAI-7B on a variety of tasks.

In addition, RakutenAI-7B-instruct has been further improved in performance through guided fine tuning based on the Foundation Model.Thetwotables belowreport the performance of RakutenAI-7B-instruct for Japanese and English LM-Harness.

As a result, it achieved an average score of 68.74 on the LM-Harness test in Japanese and English. This represents an impressive score of nearly 2 points above the nearest competing model, and the RakutenAI-7B-instruct also performed best in English performance as well, showing a marked improvement over the previous public Japanese model.

RakutenAI-7B is providing new benchmarks in the field of multilingual natural languages. In particular, it has shown advanced results in both Japanese and English, and RakutenAI-7B's role in the development of AI technology is highly anticipated.

Summary

Through advanced data filtering techniques and a systematic model development approach based on curation, RakutenAI-7B delivers high-quality, consistent output for Japanese and English. The models consistently perform well across a variety of natural language processing tasks, outperforming existing publicly available Japanese models on average. In particular, RakutenAI-7B's tokenizer is specialized for processing Japanese text, which improves learning and inference speed and potentially reduces costs.

The paper provides researchers, developers, and industry professionals with the RakutenAI-7B model to promote innovation and create positive impact in a variety of areas.

On the other hand, while the models have the ability to produce human-like text on a wide variety of topics, as with all large-scale language models, they can also produce biased, inaccurate, or unsafe output and should be used with caution.Users are expected to use these models safely and responsibly.

Categories related to this article

Takumu: I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

RakutenAI-7B" Pioneers The Frontiers Of Large-scale Language Models Specialized For Japanese

Summary

Overview of RakutenAI-7B

Performance evaluation of RakutenAI-7B

Summary

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Prediction Using LLM

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Pred ...