BERT For The Poor: A Technique To Reduce The Weight Of Complex Models Using Simple Techniques To Maximize Performance With Limited Resources!

Pruning 23/05/2020

Three main points.
✔️ Architectures for powerful NLP, such as BERT and XLNet, are out of reach for researchers with insufficient computational resources.
✔️ Proposes a pruning method that canreduce the size of the model by up to 40% while maintaining up to 98% of the original performance. It was lightweight and high performance compared to DistilBERT, which is a lightweight version of BERT by distillation.
✔️ We compared BERT with XLNet and showed that XLNet is more robust to pruning.

Poor Man's BERT: Smaller and Faster Transformer Models
written by Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov
(Submitted on 8 Apr 2020)
Comments: Published by arXiv
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

first of all

In the field of natural language processing, Transformer-based pre-training models, such as BERT, have been very successful. However, these models consist of a very large number of layers and have a very large number of parameters (millions).

Thus making the model deeper and larger leads to better performance, but the computation requires a lot of GPU/TPU memory. For example, BERT-large consists of 24 layers with 335 million parameters and requires at least 24GB of GPU memory. With such a large model, the inference time can be quite long, making it difficult for applications that require real-time processing.

To read more,

Please register with AI-SCHOLAR.