Font Size: a A A

A Study On Low-resource Multilingual Speech Recognition Based On Transfer Learning

Posted on:2020-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:L X PanFull Text:PDF
GTID:2518306518463234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of automatic speech recognition technology,good recognition results have been achieved in the traditional hybrid speech recognition architecture and the emerging end-to-end recognition architecture,in the language with sufficient acoustic corpus and rich resources.However,for some low-resource languages with relatively few corpus,the core problem ——the lack of transcribed speech training data,makes these languages still limited in the task of building speech recognition system,and their speech recognition research is still at the primary level.The research purpose of this paper is to improve the speech recognition performance of low-resource languages by adopting multilingual speech recognition technology on the end-to-end structure based on the idea of transfer learning.1.Based on the advantages of end-to-end architecture,an end-to-end Lhasa dialect monolingual speech recognition model(ASR Transformer)is proposed.On this basis,the uniqueness of Lhasa dialect is applied to the task of speech recognition.Firstly,this paper proposes a pre-training strategy for the adaptation of low resource languages to the end-to-end architecture,which can significantly ameliorate the insufficient training problem of low resource languages.Secondly,this paper explores the uniqueness of the Tibetan language itself.For the first time,the Tibetan radical is modeled as a highly compressed acoustic modeling unit and compared with the Tibetan char modeling unit,in order to improve the performance of the speech recognition system.Experiments show that for the Lhasa dialect database,the end-to-end self-attention-based Transformer model can achieve ideal results by adopting two modeling units independent of pronunciation dictionary and using the pre-training strategy low-resource language.This method can quickly build a low-resource speech recognition system without speech dictionary and language model.In terms of performance,the best result of the monolingual end-to-end system has a relatively 6.3% improvement over the baseline system result of the deep neural network model on the traditional hybrid speech recognition architecture.2.Based on the idea of transfer learning,Multilingual ASR Transformer of endto-end Lhasa dialect is proposed.First of all,the model mixes the modeling units of all the languages participating in the training.All the modeling units adopt modeling units independent of the pronunciation dictionaries of all languages.Therefore,the model completely gets rid of the dependence on the pronunciation dictionary and does not need to construct the universal pronunciation subset according to the language,which is crucial for low-resource languages lacking resources.Secondly,the model unifies the tasks of language recognition and speech recognition into a single model,which does not need language segmentation in advance.It can automatically identify languages through training and directly support multilingual speech recognition tasks.Finally,the model adopts an end-to-end architecture,eliminating the need for Gaussian Mixture Model(GMM)alignment,decision tree clustering and other processes in the traditional hybrid model architecture,greatly simplifying the process of multilingual speech recognition.In this experiment,the self-integration training of Lhasa dialect was firstly carried out by referring to the similar idea of multilingual speech recognition.It was proved that the performance of the two modeling granularity training models was better than that of the training model of a single modeling unit.On this basis,four languages similar to the target language and two modeling unit data training of the target language are adopted to build a multilingual speech recognition system of Lhasa dialect.The best result has a relatively 14.2% improvement over the baseline system.
Keywords/Search Tags:Transfer learning, End-to-end, Multilingual speech recognition, Low-resource language, Lhasa dialect
PDF Full Text Request
Related items