Font Size: a A A

Reasearch Into Speech Recognition Based On Deep Learning

Posted on:2015-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:J LiangFull Text:PDF
GTID:2298330467462377Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In the era of mobile Internet, speech recognition remains the key to achieve the freedom of human-computer interaction. Meanwhile in the age of big data, deep learning is acquiring researcher’s attention due to its efficiency in information mining. It’s of great theoretical significance and practical value to do research into speech recognition based on theory of deep learning.Deep learning is essentially a kind of information extraction technology which takes advantage of multi-layer nonlinear transformation. It’s useful in modeling complex relationships among data through its hierarchical structure. This paper first introduces the basic principles and research status in the area of speech recognition, then elaborated basic theory and the network model of deep learning, and then focuses on how to fully utilize the potential of deep learning theory in speech recognition research.1. Speech feature extraction based on Deep Auto-encoder modelAs known to all, good acoustic characteristics plays an important role in recognition systems. This article concentrates on principle of auto-encoder, and discuss some crucial components such as feature preprocessings network structure and parallel training strategy in depth. Moreover, an deep auto-encoder fed with MFCC feature is built on Matlab platform, which is meant for extracting more robust features from the raw ones. Finally, the evaluation system is constructed with HTK. The experiment shows a1.96%and3.53%improvement in word error rate while using new features with unsupervised and supervised training compared with MFCC features.2. Acoustic Modeling based on Deep Neural Network model Acoustic model is also an indispensable component of the speech recognition system. This paper first analyzes the similarities and difference between the neural network and Gaussian mixture model with respect to the model structure and training methods, then clarifies the feasibility of DNN-HMM model which is used to give a more accurate description of output probability. Both GMM-HMM and DNN-HMM acoustic model are separately trained based on Kaldi platform. With RM corpus as the training data, the experiment shows that with application of DNN-HMM model the system’s word error rate decreased by30%relatively.
Keywords/Search Tags:Speech recognition, Deep learning, Feature extraction, Acoustic modeling, DNN, Deep Auto-encoder
PDF Full Text Request
Related items