Font Size: a A A

Research On Tibetan Acoustic Modeling Method Based On Sequential Memory Neural Network

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q N WangFull Text:PDF
GTID:2428330542997957Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rise of deep learning,the acoustic model,as a core part of speech recogni-tion system,has experienced from hybrid Gaussian Mixture Model acoustic modeling to the neural network acoustic modeling,which also made a great leap in term of the recog-nition performance.Tibetan is a very important minority language in China.However,contrast with English and Mandarin,there are still great challenges faced by Tibetan speech recognition.Under these circumstances,concentrated on the Tibetan acoustic model structure,this thesis conducts a systematic and comprehensive study.On the one hand,this thesis combines the common pronunciation features of Tibetan and Mandarin and optimizes the modeling unit,which solves the Tibetan-Mandarin bilingual speech recognition problem as well as improving the recognition rate.On the other hand,we also propose some methods to enhance the robustness of acoustic models and speed up the training of acoustic models.Firstly,this thesis studies the Tibetan-Mandarin bilingual hybrid acoustic modeling method based on End-to-End framework.On the task of Tibetan speech recognition,the effects of different modeling units on Tibetan acoustic models were explored in de-tail.With the pronunciation dictionaries,this thesis combines CTC with GMM-HMM to further optimize the end-to-end acoustic model and improve the recognition rate.On Tibetan-Mandarin bilingual speech recognition tasks,the lack of Tibetan-Mandarin pronunciation dictionary makes traditional acoustic modeling methods based on Hid-den Markov Models no longer applicable.The acoustic modelling based on end-to-end framework is proposed to solve these problems by using the grapheme or char-acter rather than phoneme as the modeling unit to train the acoustic model with the weight sharing.However,the sparseness problem of model units is an intractable and ineluctable fact in model training,particularly under low-resource conditions.This the-sis explores two methods to address this problem.First,the Mandarin non-tonal sylla-bles instead of characters are used as the CTC output units.Second,an adding noise algorithm is applied to augment Tibetan-Mandarin speech.The experiments are carried out on the hybrid IFLYTEK Tibetan-Mandarin corpus.Obvious improvements can be observed by using the proposed methods.Secondly,considering the robustness of acoustic model based on the end-to-end framework under low-resource conditions,this thesis proposes multitask learning strat-egy to enhance the robustness of the acoustic model.The task of acoustic modelling with phoneme and character as model units are carried out in the multi-task learning framework respectively.Besides,the multitask learning,CTC criterion and CE crite-rion are also combined and the subordinary task using the senmon as model unit un-der CE ceriterion further optimize the end-to-end acoustic model.On Tibetan speech recognition tasks,the experimental results demonstrate that,with the proposed strategy,the recognition rate of Tibetan characters is significantly improved compared with the acoustic model based on transfer learning.Finally,this thesis proposes using FSMN as an end-to-end acoustic model to further accelerate acoustic model training.The FSMN models the long-term dependency of the timing signals by the memory blocks,thereby avoiding the time-consuming problem of the bidirection memory time in the recurrent neural network.The tapped-delay line structure in memory block make the model training much more reliably and faster.Ex-periments have shown that on the Tibetan speech recognition task,the above-mentioned method improves the training speed of the acoustic model by at least 5 times while the recognition rate only degrads by 0.2%.
Keywords/Search Tags:Tibetan, Deep Learning, Acoustic Model, Multi-Task Learning, Feedforward Sequential Memory Networks, End-to-End
PDF Full Text Request
Related items