Font Size: a A A

Reasearch Into Speech Recognition Application Based On Deep Learning

Posted on:2016-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2298330467492452Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As smart home, car voice system and a variety of voice recognition software is popular, speech recognition gradually come into people’s mind, with its easy-to-use and high accuracy the technique was accepted by the vast number of users, at the same time speech recognition as an important interface of human-computer interaction become the focus of the field of artificial intelligence research. Under the background of big data, deep learning get rapid development, due to the strong modelling power for huge amounts of data, it was applied in speech, image recognition, and astonishing results have been achieved. Considering the theoretical significance and practical value speech recognition based on deep learning is a feasible direction.Deep learning is essentially a kind of network which takes advantage of multi-layer nonlinear transformation. It’s useful in modeling complex relationships among data through its supervised parameter adjustment training. This paper summarized the present research situation and basic principle of speech recognition, then elaborated basic theory and the network model of deep learning, and then focuses on how to fully utilize the potential of deep learning theory in speech recognition research.1. Speech feature extraction based on Deep neural network modelDeep neural network is a kind of multi-layer with supervised training, mainly used for classification problem, but part of the trained network can be used to extract new speech feature. Compared with MFCC feature, the new feature have better performance on speech recognition. This article concentrates on deep neural network pre-training、supervised parameter adjustment and system optimization. Moreover, an deep neural network fed with MFCC feature is built on Kaldi platform, which is meant for extracting more robust and discriminative features from the raw ones. Finally, the experiment decrease1.98%and4.21%in word error rate and sentence error rate while using new features with deep neural network compared with MFCC features.2. Initial/finial attributes extraction based on Deep neural network modelInitial/finial attributes is a kind of speech attributes, this smaller particle size unit can give a more detailed description of phonetic phenomena. This method add voice knowledge to speech recognition, and achieved good results. The paper mainly studies the basic theory of speech attributes, start with the extraction of speech attributes, combined with the principle of deep learning, set up speech attribute extractor, and set up Initial/finial recognizers based on GMM-HMM and DNN-HMM with the extracted attributes. Finally, the experiment shows the two recognizers achieved0.65%and1.37%improvements in correct rate while using Initial/finial attributes compared with MFCC features.3. Acoustic Modeling based on Deep Neural Network modelSupervised deep learning network is essentially distinct model, through the depth network which has strong modeling ability to replace the shallow GMM model to give a more accurate description of output probability, and combine with the HMM model training acoustic model. Both GMM-HMM、DNN-HMM and CNN-HMM acoustic model are separately trained based on Kaldi platform. With863corpus as the training data, the experiment shows that with application of DNN-HMM and CNN-HMM model the new systems’word error rate decreased by7.98%and9.01%respectively. And compares the three methods for analysis.
Keywords/Search Tags:Speech recognition, Deep learning, Feature extraction, Acoustic modeling, DNN, CNN
PDF Full Text Request
Related items