End-to-end Speech Translation "NeurST".
3 main points
✔️ An open-source toolkit for neural speech translation.
✔️ Easy-to-use and flexible end-to-end speech translation system.
✔️ Setup for benchmarking, feature extraction, data preprocessing, distributed training and much more.
NeurST: Neural Speech Translation Toolkit
written by Chengqi Zhao, Mingxuan Wang, Lei Li
(Submitted on 18 Dec 2020 (v1))
Comments: arXiv:2012.10018 [cs.CL]
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)![]()
![]()
Introduction
Neural Speech Translation (NST) is a crucial contribution to deep learning. The common approach to NST is the cascading approach which uses separate Automatic Speech Recognition (ASR) and Neural Machine Translation(NMT) models. This approach is prone to error propagation i.e a faulty result from the ASR will certainly result in faulty NMT results. A more recent end-to-end approach aims to directly transform speech into translated text and therefore mitigates error propagation. It also reduces the model size making it suitable for deployment. Despite the impressive performance of end-to-end models, there seems to be inconsistency while benchmarking the models during different research works. This is due to the complexity of preprocessing audio data which involves tricky data augmentation and pre-training. The NeurST toolkit is here to solve those problems.
NeurST provides implementations of state of the art transformer-based models and includes feature extraction, data preprocessing, training, and inference modules, enabling researchers to reproduce the benchmark results. It is implemented in TensorFlow2.
To read more,
Please register with AI-SCHOLAR.
ORCategories related to this article