Research On End-to-End Simultaneous Speech Translation Based Transformer Transducer

Posted on:2022-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zou

Full Text:PDF

GTID:2518306743451794

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As one branch of translation technology,simultaneous speech translation technology has a broad application value,such as the automatic subtitle generation for foreign language videos,simultaneous interpreting for international conferences and so on.However,compared with the developed neural machine translation technology,simultaneous speech translation technology is facing great challenges.The traditional cascade model which is composed of speech recognition model and machine translation model has natural disadvantages in processing delay,error propagation and so on.Although the end-to-end simultaneous speech translation model can avoid these problems,it is not easy to deal with information which consist of speech and text.Additionally,previous studies have shown that the Transducer end-to-end model can not only achieve a very low word error rate,but also have a very low streaming processing delay in the speech recognition task.Therefore,this paper will focus on the application of Transformer Transducer model in the speech translation task.The main work and innovation points are as follows:Firstly,according to the different word order of audio sequence and translation sequence alignment in speech translation task,this paper propose a new Transformer Transducer model and two different mask structures.In this paper,Conv-Transformer neural network structure is used to extract audio features in the transcription network module.Unidirectional self-attention Transformer neural network is used to encode the translation sequence in the prediction network module.And cross-attention Transformer neural network is used to characterize audio features and text features in the integration network module.In the model inference stage,this paper also designs two corresponding streaming decoding methods for low delay setting and high accuracy setting respectively.Secondly,we do a great deal of experiment on Transformer Transducer end-toend simultaneous speech translation model with different optimization methods.(1)We research the influence of pre-training method.In this section,the Transformer Transducer model parameters are initialized by pre-training speech recognition model and pre-training language model respectively and the result is analyzed by experiments.(2)We research the influence of additional auxiliary loss function.In this section,we experiment the off-line speech translation loss function,sequence level Transducer loss function regularization and translation delay loss function on Transformer Transducer model respectively.(3)We research influence of knowledge distillation method.In this section,we experiment the sequence level knowledge distillation method and analyze its optimization effect on the Transformer Transducer model.In addition,this paper also compares Transformer Transducer model with other state of the art end-to-end simultaneous speech translation models.Our model has achieved very good results on the MUST-C public dataset.Especially in the low delay part,our model gains significant improvements by over 8-10 BLEU points.

Keywords/Search Tags:

simultaneous speech translation, end-to-end model, Transformer Transducer, pre-training, knowledge distillation, loss function

PDF Full Text Request

Related items

1	Research And Implementation Of End-to-End Long-term Speech Recognition Model Base On RNN-Transducer
2	Research On End-To-End Speech Translation
3	Research On Financial Text Generation Method Based On Knowledge Distillation And Pre-training Model
4	Research And Implementation Of Contract Translation Based On Neural Machine Translation Model
5	Research On Multilingual Simultaneous End-to-End Speech Translation
6	End-to-end Speech Synthesis
7	Research On Speech Recognition Based On Transformer
8	Research On Knowledge Distillation For Image Super Resolution
9	A Study Of Image Classification Algorithm Based On Heterogeneous Federated Learning
10	Ontology-Based Dialogue State Tracking And Its Knowledge Distillation Method