The Recognition Model Research Based On Whole Acoustic Structure Features Of Speech Unit

Posted on:2006-06-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S N He

Full Text:PDF

GTID:1118360152498259

Subject:Circuits and Systems

Abstract/Summary:

The objective of speech recognition is changing speech to words correctly. The speech recognition model based on Bayes rule is involved in expressing and using properly knowledge of both acoustic layer and language one. It has been shown for many years that it is of great significance for improving the speech recognition rates to define recognition unit by the concrete object features, detecting the endpoints well and truly, finding the parameters featuring more acoustic difference among the different units and stand against many disturbance factors, then creating the recognition model based on these factors that has a minimum overlapped space distribution. The scope of this thesis is focusing on the natural and telephonic speech endpoint detection, the creation of acoustic recognition model of whole Chinese syllable, the design and performance analysis of the robust English digitized recognition model with low SNR, and demostration of experiments showing the feasibility and effectiveness of the new algorithms and models we proposed.The main contribution of this thesis can be summarized as follows:(1) In case that the Chinese continuous speech is of high SNR, we propose an endpoint detection algorithm that only uses short time peak-valley energy of Chinese syllable. It is simple and easy to use, also featurig high detection rate. It has been shown by experiments that more than 96% endpoints can be detected successfully for Chinese syllables.(2) After analyzing weak pronunciation appearance resulting from Chinese co-articulation and some deficiencies of time domain endpoint detection, we develop a novel endpoint detection algorithm based on multi sub-band spectrum features. It uses spectrogram information to catch the exact local jump time between adjacent Chinese syllables, especially for weak syllable, which is often leaked by usual detection algorithm. The average detection rate is more than 97%.(3) According to the characteristics of telephonic digitized speech with low SNR as well as random noise, we design the mixed pulse detection algorithm based on the frame time domain energy and main spectral band energy(300~1500Hz). It is more appropriate for narrow telephonic speech. The digit endpoints can be found through rectifying, filtering and combining the pulse sequences and adjusting their location. Its prominent merit is the adaptability to broad scope of SNR. It can detect edge points of speech with SNR as low as 3.5dB.

Keywords/Search Tags:

Acoustic layer, Language layer, Endpoint Detection, Multi Sub-band Spectrum, HMM(Hidden Markov Model), DTW(Dynamic Time Warping), MFCC(Mel Frequency Cepstrum Coefficients), Confidence Measure

Related items

1	Study Of Speech Recognition System For Mandarin Digit Based On HMM
2	Study On Isolated Mandarin Speech Recognition Technology
3	Research And Implementation Of Speech Recognition Algorithm Based On DSP
4	Study Of Mandarin Digit Speech Recognition Algorithm Based On HMM Model
5	The Research Of Small Vocabulary Speaker-independent Isolated Word Speech Recognition System
6	The Research Of Small Vocabulary Speaker-Independent Isolated Word Speech Recognition System
7	Research On Phonetic Similarity Evaluation Algorithm
8	Study On The System Of Mandarin Digit Speech On The Basis Of DSP
9	The Research On The Speech Recognition System In The Noisy Environment
10	Research On The Automatic Classification Of Cough