Everything You Need To Know About Transformer In Computer Vision! Part4/5 (Multimodal Tasks)

Transformer 27/01/2021

3 main points
✔️Explain the applications of Transformer in computer vision
✔️Explains research examples in segmentation, image generation, and low-level vision tasks
✔️Total of 37 models, 9 models are described in this article

Transformers in Vision: A Survey
written by Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
(Submitted on 4 Jan 2021)
Comments: 24 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

first of all

Transformer has shown its high performance not only in natural language processing but also in many other areas. Among them, the application research of the Transformer in the field of computer vision, which deals with visual information, has become very popular.

In view of this demand, we will provide a very extensive and detailed description of Transformer in computer vision.

In this article, we introduce an application of the Transformer in multimodal tasks.

A total of nine models of multimodal tasks are described.

See Parts 2, 3, and 5 for examples of research on other tasks, and Part 1 for a general description of transformers in computer vision.

Overall Structure (Table of Contents)

1. about Transformer in Computer Vision (explained in Part1)

2. A Concrete Example of Transformer in Computer Vision(Part2～5)
2.1 Transformers for Image Recognition(Part2)
2.2 Transformers for Object Detection(Part2)
2.3 Transformers for Segmentation(Part3)
2.4 Transformers for Image Generation(Part3)
2.5 Transformers for Low-level Vision(Part3)
2.6 Transformers for Multi-modal Tasks
・ViLBERT(Vision and Language BERT)
・LXMERT
・VisualBERT
・VL-BERT
・Unicoder-VL(Universal Encoder for Vision and Language)
・UNITER
・OSCAR(Object-Semantics Aligned Pre-training)(
・Vokenization
・Vision-and-Language Navigation
2.7 Video Understanding(Part5)
2.8 Transformers in Low-shot Learning(Part5)
2.9 Transformers for Clustering(Part5)
2.10 Transformers for 3D Analysis(Part5)

3. Issues and future prospects of Transformer in computer vision (explained in Part1)

To read more,

Please register with AI-SCHOLAR.

Categories related to this article

anonymous

Everything You Need To Know About Transformer In Computer Vision! Part4/5 (Multimodal Tasks)

first of all

Overall Structure (Table of Contents)

Cross-Layer Attention Significantly Reduces Transformer Memory

Cross-Layer Attention Significantly Reduces Transformer Memory

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, Are Also Available!

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, A ...

[MusicLM] Text-to-Music Generation Model Developed By Google.

[MusicLM] Text-to-Music Generation Model Developed By Google.

Sparse Transformers: An Innovative Approach To The Problem Of Increasing Computational Complexity With Input Sequence Length

Sparse Transformers: An Innovative Approach To The Problem Of Increasing Computational Complexity Wi ...

Breaking Through The Barriers Of Computation Time And Memory!

Breaking Through The Barriers Of Computation Time And Memory!

LONGNET: Model Capable Of Processing Text Up To 1 Billion Tokens

LONGNET: Model Capable Of Processing Text Up To 1 Billion Tokens