Catch up on the latest AI articles

Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part1

Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part1


3 main points
✔️ About the improved version of Transformer "Efficient Transformer
✔️ About the general classification of Efficient Transformer 
✔️ About the related information of Efficient Transformer

Efficient Transformers: A Survey
written by 
Yi TayMostafa DehghaniDara BahriDonald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

first of all

Attention is all you need The Transformer proposed in Section 3.1 has shown success in natural language processing, including BERT, as well as in image processing and reinforcement learning. Despite these successes, Transformer is still not perfect.

A particularly significant challenge is the computational complexity of the Transformer. 
The computational complexity of the Transformer is proportional to the square of the input sequence length, which poses serious problems in terms of cost and memory requirements during training and inference. For this reason, research on more efficient transformers (Efficient Transformers) that improve the transformer algorithm has become very popular.

In this research area, Reformer and Synthesizer have been discussed in the past on this site, and many other Efficient Transformers have already been proposed. The progress of Efficient Transformers is so fast that it is very difficult to grasp their full picture.

In light of this situation, this article provides a comprehensive explanation of improvements to Transformer.

In this article, we will explain Efficient Transformer in general, and in the following articles (Part2, Part3(published the day after tomorrow)), we will explain more specific and detailed information about each model.

table of contents

1. about the amount of calculation of Transformer
Multi-Head Self-Attention

2. Efficient Transformerの分類
    2.1. Fixed Patterns (FP)
        Blockwise Patterns
        Strided Patterns
        Compressed Patterns
    2.2. Combination of Patterns (CP) 
    2.3. Learnable Patterns (LP)
    2.4. Memory
    2.5. Low-Rank Methods
    2.6. Kernels
    2.7. Recurrence

3. related information of Efficient Transformer
3.1. about evaluation 
3.2. various initiatives 
Weight Sharing 
Quantization / Mixed Precision 
Knowledge Distillation / Pruning 
Neural Architecture Search (NAS) 
Task Adapters

4. specific examples of Efficient Transformer (another article Part2Part3)

To read more,

Please register with AI-SCHOLAR.

Sign up for free in 1 minute


If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us