Catch up on the latest AI articles

Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part 3

Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part 3

Transformer

3 main points.
✔️ Introduction to specific examples of Efficient Transformer models
✔️ Explains methods that use learnable patterns, low-rank factorization, kernels, and recursion
✔️ Achieve Attention of linear order O(N) at best

Efficient Transformers: A Survey
written by 
Yi TayMostafa DehghaniDara BahriDonald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
 
  

first of all

Research on more efficient transformers (Efficient Transformers) by improving the algorithm of transformers is now very active. The progress in this research area is so fast that many Efficient Transformers have already been proposed, and it is very difficult to grasp the whole picture. In this article, we take this reality into account and provide a

In light of this situation, this article provides a comprehensive explanation of the improvements to the Efficient Transformer. A description of Efficient Transformer in general, its broad classification, and other basics can be found in this article. In this article, we will provide more specific and detailed explanations about the architecture and time/space computational complexity of Efficient Transformer models proposed in the past.

The models presented in this article will be classified as Learnable Pattern (LP), Low Rank Factorization (LR), Kernel (KR) and Recursion (RC) based approaches. (4.5 - 4.8)

For an explanation of the other classified models, please see this article.

table of contents

1. about the calculation amount of Transformer (explained in another article)

2. classification of Efficient Transformer (explained in another article)

3. related information on Efficient Transformer (explained in a separate article)

4. specific examples of Efficient Transformer
 4.1. fixed pattern based (FP) (explained in another article)
  Memory Compressed Transformer
  Image Transformer
 4.2. global memory based (M) (explained in another article)
  Set Transformers
 4.3. Combinations of FP (explained in another article)
  Sparse Transformers
  Axial Transformers
 4.4. Combinations of Fixed Patterns and Global Memory Base (FP+M) (described in another article)
  Longformer
  ETC
  BigBird
 4.5. learnable pattern based (LP)
  Routing Transformers
  Reformer
  Sinkhorn Transformers
 4.6. Low Rank Factorization Based (LR)
  Linformer
  Synthesizers
4.7. Kernel-based (KR)
  Performer
  Linear Transformers
 4.8. Recursion-based (RC)
  Transformer-XL
  Compressive Transformers

To read more,

Please register with AI-SCHOLAR.

Sign up for free in 1 minute

OR

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us