Point Cloud X Transformer!

Transformer 13/01/2021

3 main points.
✔️ We propose a new transformer-based approach to understanding 3D point clouds.
✔️ Design a new transformer-based architecture
✔️ Obtain SOTA on several 3D point cloud datasets

Point Transformer
written by Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun
(Submitted on 16 Dec 2020)
Comments: Accepted to arXiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV)

comm

Introduction

Transformers have taken over Natural Language processing and have been pushing the state of the art very rapidly. Recent endeavors to utilize the power of these self-attention networks in computer vision problems have been fruitful. Therefore, it is intuitive to try to implement transformers for 3D point cloud processing. Moreover, 3D point clouds are sets embedded in three-dimensional space and therefore the invariance of self-attention networks to set permutation or set cardinality makes them even more favorable for 3D point cloud processing. Based on these intuitions, this paper introduces a novel transformer-based layer for 3D point cloud processing. This layer is extended into a Point Transformer network that is able to set a new state of art on datasets across various domains and tasks.

Some 3D point cloud processing methods

The most common approach to processing 2D points is to arrange them in matrices with a different channel for each color and then using convolutions on them. For 3D points, there are quite a few approaches. The three major approaches for 3D point cloud processing are described briefly below:

1) Projection-based networks

In this approach, the 3D points are projected onto several planes and then further processing is conducted for each of those planes using 2D CNNs. In the end, the results are compiled to form the final output. The choice of projection planes heavily influences the final output and there is the loss of information during projection.

2) Voxel-based networks

In this approach, the 3D points are converted into voxels and 3D CNNs are deployed for further processing. Usually, 3D points are extremely sparse and this approach increases the computation and memory load. A solution is to use sparse CNNs and skip the empty voxels. There is also some information loss when the points are converted into discrete voxels.

3) Point-based networks

In this approach, the 3D point clouds are directly processed using permutation-invariant networks composed of pointwise MLPs, pooling layers, and sampling heuristics. Unlike the other methods, the information remains intact without a significant increase in memory and computation requirements. The Point Transformer network is based on this approach.

To read more,

Please register with AI-SCHOLAR.

Categories related to this article

Thapa Samrat: I am a second year international student from Nepal who is currently studying at the Department of Electronic and Information Engineering at Osaka University. I am interested in machine learning and deep learning. So I write articles about them in my spare time.