Font Size: a A A

Cross-domain Speech Recognition Research Based On Deep Learning

Posted on:2018-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:S Q GuoFull Text:PDF
GTID:2348330569486316Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,deep learning has become very hot.As an important field of machine learning and a key technology to realize human-computer interaction,it has a multi-layer nonlinear transformation network,and the hierarchical feature structure makes it have superior data modeling ability.It has been widely used in several aspects,such as image,voice and computer vision,and achieved good results.At the same time,speech recognition is increasingly being applied in many fields.This thesis first introduces the basic principle of speech recognition and the theoretical knowledge of deep learning,and then introduces how to apply deep learning technology to cross-domain speech recognition research,which mainly includes the use of deep neural network and cyclic neural network to carry out acoustic model and language model training respectively.Generally speaking,speech recognition has several basic steps,such as data preparation,feature extraction,model building,decoding identification and so on.This thesis is based on deep learning,and build a large vocabulary speech recognizer on Kaldi platform.At the same time,in order to solve the problem of cross-domain recognition mismatch,language model adaptive research is carried out.The main work is as follows:(1)Create the required corpus and establish Kaldi development platform on the Linux system,including source code compilation,operating environment configuration and the installation of CUDA,etc.(2)First of all,build the GMM-HMM baseline system,and then complete within the DNN-HMM model training,system optimization and parameter adjustment,analysis and comparison of the two systems modeling capabilities.The experimental results show that the acoustic model based on DNN is reduced by 8.88% compared with GMM system.(3)A language model adaptive framework is proposed to reduce the difference caused by the trained language model which does not matched to the the test corpus in some areas with small amount of data,so a frame of language model adaption proposed to reduce the difference from the corpus.The framework is roughly divided into two aspects.On the one hand,a 3-gram language model was trained by filtering the corpus,and then the linear interpolation method was used to the model adaptive operation to obtain an adaptive 3-gram model.On the other hand,due to the superiority of the RNN language model,an RNN background model is trained first,and then the adaptive operation is carried out to improve the model recognition ability for cross domain adaptation corpus.Finally,the adaptive 3-gram model and adaptive RNNLM are used to re-identify the recognition results.The experimental results show that the recognition result after adaptive operation is about 20% higher than that of the background model.
Keywords/Search Tags:speech recognition, cross-domain, model adaptation, deep learning
PDF Full Text Request
Related items