Cross-domain Speech Recognition Research Based On Deep Learning

Posted on:2018-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Guo

Full Text:PDF

GTID:2348330569486316

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,deep learning has become very hot.As an important field of machine learning and a key technology to realize human-computer interaction,it has a multi-layer nonlinear transformation network,and the hierarchical feature structure makes it have superior data modeling ability.It has been widely used in several aspects,such as image,voice and computer vision,and achieved good results.At the same time,speech recognition is increasingly being applied in many fields.This thesis first introduces the basic principle of speech recognition and the theoretical knowledge of deep learning,and then introduces how to apply deep learning technology to cross-domain speech recognition research,which mainly includes the use of deep neural network and cyclic neural network to carry out acoustic model and language model training respectively.Generally speaking,speech recognition has several basic steps,such as data preparation,feature extraction,model building,decoding identification and so on.This thesis is based on deep learning,and build a large vocabulary speech recognizer on Kaldi platform.At the same time,in order to solve the problem of cross-domain recognition mismatch,language model adaptive research is carried out.The main work is as follows:(1)Create the required corpus and establish Kaldi development platform on the Linux system,including source code compilation,operating environment configuration and the installation of CUDA,etc.(2)First of all,build the GMM-HMM baseline system,and then complete within the DNN-HMM model training,system optimization and parameter adjustment,analysis and comparison of the two systems modeling capabilities.The experimental results show that the acoustic model based on DNN is reduced by 8.88% compared with GMM system.(3)A language model adaptive framework is proposed to reduce the difference caused by the trained language model which does not matched to the the test corpus in some areas with small amount of data,so a frame of language model adaption proposed to reduce the difference from the corpus.The framework is roughly divided into two aspects.On the one hand,a 3-gram language model was trained by filtering the corpus,and then the linear interpolation method was used to the model adaptive operation to obtain an adaptive 3-gram model.On the other hand,due to the superiority of the RNN language model,an RNN background model is trained first,and then the adaptive operation is carried out to improve the model recognition ability for cross domain adaptation corpus.Finally,the adaptive 3-gram model and adaptive RNNLM are used to re-identify the recognition results.The experimental results show that the recognition result after adaptive operation is about 20% higher than that of the background model.

Keywords/Search Tags:

speech recognition, cross-domain, model adaptation, deep learning

PDF Full Text Request

Related items

1	Research On Cross-domain Expression Recognition Based On Domain Adaptation Method
2	Research On The Cross-race Face Recognition Algorithm Based On Deep Domain Adaptation
3	Research On Adaptation Methods In Deep Learning Based Speech Recognition Systems
4	Speech Emotion Recognition Via Domain Adaptation
5	The Research Of Cross-user Activity Recognition Based On Deep Learning And Unsupervised Domain Adaptation
6	Reasearch On Cross Corpus Speech Emotion Recognition Based On Domain Adversarial Training
7	Research On Speech Emotion Recognition Methods Based On Deep Learning And Transfer Learning
8	Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning
9	Research On Cross-domain Object Detection In Remote Sensing Images
10	Research On Cross-domain Person Re-identification Based On Deep Learning