Everything You Need To Know About Transformer In Computer Vision! Part2/5(Image Recognition And Object Detection)

Transformer 25/01/2021

3 main points
✔️Explain the applications of Transformer in computer vision.
✔️This article discusses real-world examples in image recognition and object detection tasks.
✔️This article describes 9 out of 37 models in total.

Transformers in Vision: A Survey
written by Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
(Submitted on 4 Jan 2021)
Comments: 24 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

first of all

Transformer has shown its high performance not only in natural language processing but also in many other areas. Among them, the application research of the Transformer in the field of computer vision, which deals with visual information, has become very popular. In view of this demand, this article provides a very extensive and detailed description of the Transformer in computer vision. In this article, we will introduce some applications of the Transformer in image recognition and object detection tasks.

Seven models for image recognition and two models for object detection are described.

See Parts 3, 4, and 5 for examples of research on other tasks, and Part 1 for a general description of transformers in computer vision.

Overall Structure (Table of Contents)

1. about Transformer in Computer Vision (Part1)

2. A Concrete Example of Transformer in Computer Vision
　2.1 Transformers for Image Recognition
　・CCNet(Criss-cross Attention)
　・Stand-alone Self-Attention
　・Local Relation Networks
　・Attention Augmented Convolutional Networks
　・Vectorized Self-Attention
　・ViT(Vision Transformer)
　・DeiT(Data-efficient image Transformers)
　2.2 Transformers for Object Detection
　・DETR(Detection Transformer)
　・D-DETR(Deformable DETR)
　2.3 Transformers for Segmentation
　2.4 Transformers for Image Generation
　2.5 Transformers for Low-level Vision(Part3)
　2.6 Transformers for Multi-modal Tasks(Part3)
　2.7 Video Understanding(Part4)
　2.8 Transformers in Low-shot Learning(Part4)
　2.9 Transformers for Clustering(Part4)
　2.10 Transformers for 3D Analysis(Part4)