Font Size: a A A

Empirical Mode Decomposition And The Deep Belief Network In The Application Of Speech Recognition Research

Posted on:2016-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2308330467473435Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent applications and networking technology, thespeech recognition as a convenient and effective human-computer interaction whose importancebecomes increasingly prominent. However, due to the complicated environment of the speechproduction and huge difference in human language accent, the speech recognition accuracy facesgreater difficulties. So it is difficult to achieve the expected experience results about thecorresponding speech products. To make the speech recognition technology breakthrough, itmust make an improvement and innovation in speech recognition on the basis of theory.As the speech signals are typically nonlinear and non-stationary, the traditional speechprocessing methods such as Fourier transform, Wavelet transform are assumed that the signal issmooth short. Therefore, the conclusions are bound to disrupt the analysis and extraction of theuseful information. As the background, this paper describes the basics of speech recognition asthe premise, then it focuses on the speech endpoint detection algorithm under different SNR andthe isolated word speech recognition system based on deep belief network.The main contents and innovations in this paper are as follows:(1) In-depth study of the limitations of the traditional speech processing methods, this paperfocuses on the empirical mode decomposition as a new signal processing method, and it makes aalgorithm verification about the character which the empirical mode decomposition is suitablefor nonlinear and non-stationary signal.(2) As the traditional speech endpoint detection algorithms exist a shortcoming which thespeech signal detection accuracy is not high for the noisy speech, this paper proposes a newspeech endpoint detection algorithm based on empirical mode decomposition and compositeenergy. Firstly, this algorithm uses empirical mode decomposition to make the speech signaldecomposed into a set of intrinsic mode functions and a remaining amount, and it filters out thelow-level intrinsic mode functions which contain a lot of noise, the remaining intrinsic modefunctions are reconstructed into the speech signal. Then it uses the character of Teager energyabout compression noise amplitude to calculate the each frame’s Teager energy of thereconstructed speech signal, so it can obtain a composite energy per frame of the signal byweighting the Teager energy to the short-term energy. Finally, it can get the starting and endingpoints of the signal by comparing the composite energy and adaptive threshold for each frame.The simulation results show the effectiveness of the algorithm, compared to the traditional dual-threshold method which based on short term energy and zero-crossing rate, this algorithmreflects the superiority in low SNR environment.(3) As traditional neural networks exist two shortcomings which training is slow and easy tofall into local minimum value in speech recognition, this paper designs an isolated word speechrecognition system which based on a deep belief network. The system first trains the RestrictionsBoltzmann Machine (RBM) individually, and the first output of the trained RBM as the secondRBM’s input, then it trains the second one individually until the last RBM. Then, all trainedRBM are stacked into the deep belief network, and the deep belief network is optimized by theback-propagation algorithm, so it is a trained deep belief network model. Finally, the MelCepstrum parameter of the speech signal as a input of the deep belief network which is anisolated word speech recognition. After making simulation experiments, this system achieves ahigher recognition rate compared with the improved BP neural network.
Keywords/Search Tags:Speech endpoint detection, Empirical mode decomposition, Composite Energy, Speech Recognition, Deep belief network
PDF Full Text Request
Related items