Font Size: a A A

The Recognition Model Research Based On Whole Acoustic Structure Features Of Speech Unit

Posted on:2006-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:S N HeFull Text:PDF
GTID:1118360152498259Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The objective of speech recognition is changing speech to words correctly. The speech recognition model based on Bayes rule is involved in expressing and using properly knowledge of both acoustic layer and language one. It has been shown for many years that it is of great significance for improving the speech recognition rates to define recognition unit by the concrete object features, detecting the endpoints well and truly, finding the parameters featuring more acoustic difference among the different units and stand against many disturbance factors, then creating the recognition model based on these factors that has a minimum overlapped space distribution. The scope of this thesis is focusing on the natural and telephonic speech endpoint detection, the creation of acoustic recognition model of whole Chinese syllable, the design and performance analysis of the robust English digitized recognition model with low SNR, and demostration of experiments showing the feasibility and effectiveness of the new algorithms and models we proposed.The main contribution of this thesis can be summarized as follows:(1) In case that the Chinese continuous speech is of high SNR, we propose an endpoint detection algorithm that only uses short time peak-valley energy of Chinese syllable. It is simple and easy to use, also featurig high detection rate. It has been shown by experiments that more than 96% endpoints can be detected successfully for Chinese syllables.(2) After analyzing weak pronunciation appearance resulting from Chinese co-articulation and some deficiencies of time domain endpoint detection, we develop a novel endpoint detection algorithm based on multi sub-band spectrum features. It uses spectrogram information to catch the exact local jump time between adjacent Chinese syllables, especially for weak syllable, which is often leaked by usual detection algorithm. The average detection rate is more than 97%.(3) According to the characteristics of telephonic digitized speech with low SNR as well as random noise, we design the mixed pulse detection algorithm based on the frame time domain energy and main spectral band energy(300~1500Hz). It is more appropriate for narrow telephonic speech. The digit endpoints can be found through rectifying, filtering and combining the pulse sequences and adjusting their location. Its prominent merit is the adaptability to broad scope of SNR. It can detect edge points of speech with SNR as low as 3.5dB.
Keywords/Search Tags:Acoustic layer, Language layer, Endpoint Detection, Multi Sub-band Spectrum, HMM(Hidden Markov Model), DTW(Dynamic Time Warping), MFCC(Mel Frequency Cepstrum Coefficients), Confidence Measure
PDF Full Text Request
Related items