End-to-end Speech Translation "NeurST".

Voice Recognition 30/01/2021

3 main points
✔️ An open-source toolkit for neural speech translation.
✔️ Easy-to-use and flexible end-to-end speech translation system.
✔️ Setup for benchmarking, feature extraction, data preprocessing, distributed training and much more.

NeurST: Neural Speech Translation Toolkit
written by Chengqi Zhao, Mingxuan Wang, Lei Li
(Submitted on 18 Dec 2020 (v1))
Comments: arXiv:2012.10018 [cs.CL]
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Introduction

Neural Speech Translation (NST) is a crucial contribution to deep learning. The common approach to NST is the cascading approach which uses separate Automatic Speech Recognition (ASR) and Neural Machine Translation(NMT) models. This approach is prone to error propagation i.e a faulty result from the ASR will certainly result in faulty NMT results. A more recent end-to-end approach aims to directly transform speech into translated text and therefore mitigates error propagation. It also reduces the model size making it suitable for deployment. Despite the impressive performance of end-to-end models, there seems to be inconsistency while benchmarking the models during different research works. This is due to the complexity of preprocessing audio data which involves tricky data augmentation and pre-training. The NeurST toolkit is here to solve those problems.

NeurST provides implementations of state of the art transformer-based models and includes feature extraction, data preprocessing, training, and inference modules, enabling researchers to reproduce the benchmark results. It is implemented in TensorFlow2.

To read more,

Please register with AI-SCHOLAR.

Categories related to this article

Thapa Samrat: I am a second year international student from Nepal who is currently studying at the Department of Electronic and Information Engineering at Osaka University. I am interested in machine learning and deep learning. So I write articles about them in my spare time.

End-to-end Speech Translation "NeurST".

Introduction

The Secrets Of Speech Recognition Technology

The Secrets Of Speech Recognition Technology

Model Lightweight Techniques! Lightweight And High Performance Speech Emotion Recognition Model LightSER-NET!

Model Lightweight Techniques! Lightweight And High Performance Speech Emotion Recognition Model Ligh ...

Ultra-lightweight CNN Speech Recognition Model! Google-developed "ContextNet" Explained!

Ultra-lightweight CNN Speech Recognition Model! Google-developed "ContextNet" Explained!

This Is The SoTA Paper On Speech Recognition! What A Study By Google That Pushes The Limits Of Semi-supervised Learning!

This Is The SoTA Paper On Speech Recognition! What A Study By Google That Pushes The Limits Of Semi- ...

Facebook AI Has Developed A New Voice Separation Model With RNN! Extracting Only Your Voice From A Large Group Of People's Convers ...

Facebook AI Has Developed A New Voice Separation Model With RNN! Extracting Only Your Voice From A L ...

[wav2vec 2.0] Facebook AI Unveils A New Speech Recognition Framework! Self-supervised Learning Achieves High Accuracy Without Corr ...

[wav2vec 2.0] Facebook AI Unveils A New Speech Recognition Framework! Self-supervised Learning Achie ...