Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part1

Transformer 22/12/2020

3 main points
✔️ About the improved version of Transformer "Efficient Transformer
✔️ About the general classification of Efficient Transformer
✔️ About the related information of Efficient Transformer

Efficient Transformers: A Survey
written by Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

first of all

Attention is all you need The Transformer proposed in Section 3.1 has shown success in natural language processing, including BERT, as well as in image processing and reinforcement learning. Despite these successes, Transformer is still not perfect.

A particularly significant challenge is the computational complexity of the Transformer.
The computational complexity of the Transformer is proportional to the square of the input sequence length, which poses serious problems in terms of cost and memory requirements during training and inference. For this reason, research on more efficient transformers (Efficient Transformers) that improve the transformer algorithm has become very popular.

In this research area, Reformer and Synthesizer have been discussed in the past on this site, and many other Efficient Transformers have already been proposed. The progress of Efficient Transformers is so fast that it is very difficult to grasp their full picture.

In light of this situation, this article provides a comprehensive explanation of improvements to Transformer.

In this article, we will explain Efficient Transformer in general, and in the following articles (Part2, Part3(published the day after tomorrow)), we will explain more specific and detailed information about each model.

1. about the amount of calculation of Transformer
Multi-Head Self-Attention

2. Efficient Transformerの分類
2.1. Fixed Patterns (FP)
      Blockwise Patterns
      Strided Patterns
      Compressed Patterns
2.2. Combination of Patterns (CP)
2.3. Learnable Patterns (LP)
2.4. Memory
2.5. Low-Rank Methods
2.6. Kernels
2.7. Recurrence

3. related information of Efficient Transformer
3.1. about evaluation
3.2. various initiatives
Weight Sharing
Quantization / Mixed Precision
Knowledge Distillation / Pruning
Neural Architecture Search (NAS)
Task Adapters

4. specific examples of Efficient Transformer (another article Part2, Part3)

To read more,

Please register with AI-SCHOLAR.