The purpose of speech recognition is that the machine of auditory function will develop to direct the person to receive voice, understand people's intentions and make the appropriate response. Speech recognition can be widely used in information processing, communications and electronic systems, automation and other fields. At present,the accuracy of speech endpoint detection can be satisfactory in quiet environment. However, the Performance of the recognition severely degrades in actual different kinds of noise environments. As a result,the study on speech recognition is particularly important in heavy noise environments.First of all, speech recognition in the noise environment is importance and inevitable from the history, development trends and issues facing, and the primary means of the speech recognition system in speech recognition are introduced.Speech recognition system includes preprocessing, feature extraction, reference model, pattern matching. Endpoint detection is one of the important steps that are decided to the effect of speech recognition. Feature extraction is the substance of the speech signal after pre-treatment by a transformation, removing redundant signal.Such as short-term zero-crossing rate, LPCC, MFCC.Speech recognition system is divided into two stages. The first stage is the study and training. The training data should be found, which are usually carefully selected for this recognition system. The characteristic parameters of the voice signal are extracted by signal processing methods. Combined with the initial value of system parameters, we can adjust the parameters of the system and make the system more suitable for the training data. Finally, these trained parameters that are reference model are stored .The second stage is recognition. The characteristic parameters of speech signal contrast training process template, measure the effect of the voice signal matching with the certain error, and output the voice signal identification results.When the training environment does not match with the actual environment, the performance has declined markedly. In real life, however, voice signal will inevitably be subject to the influence of the surrounding environment. This paper will focus on speech enhancement before the endpoint detection. In the first, I deeply study the basic methods of speech enhancement. The Paper has a deeply research on methods of speech enhancement and theory derivation,concluding wiener filter, kalman filter, spectral subtraction, self-adapting. The paper also puts the wavelet into speech enhancement. This method continuously proceed several wavelet decomposition ,meanwhile it can set a threshold for the scale wavelet coefficient,At last,we can use the disposed wavelet coefficient to reconstruct the signal in inverse wavelet transform to recover effective signal, remove the noise signal.This paper deal with speech signals that the snr is 5 by wavelet soft and hard threshold. But the hard-threshold method, the effect of speech enhancement is not very satisfactory, reconstruction of the signal will produce oscillations; Soft-threshold method, the reconstructed signals have a greater signal distortion. because speech signals after signal enhancement have lost the important information, the two methods can not applied to the field of speech recognition.On this basis , the paper analyzes wavelet threshold about improved methods from different scholars ,which include compromise threshold method,μlegal threshold method and a method to improve the threshold considered smooth transition. The paper has proposed a new threshold method ,and the experimental results of the speech signal enhancement have greatly improved.In addition, after the wavelet decomposition voice signal is divided into high-frequency and low frequency parts. The high-frequency signals mainly focus voice signals, and the low-frequency signals focus mainly on noise signal. So the paper can reduce the signal on the low-frequency.The paper adopts the improved method which is the kalman filter method based on wavelet transform, because the kalman filter is the best estimator at a minimum mean square error sense .the method retains the advantages of the kalman filter and wavelet analysis theory.From psychology- acoustics principle, the human auditory system can sense the voice frequency whose range is 20Hz - 20KHz, and our perceptions are obviously different from areas of low-frequency to high frequency sound. But, we have higher degrees of perception to 500Hz - 7 KHz in the frequency range. Wavelet transform can be combined with characteristics of auditory masking. First of all, considering the sub-band signal on the consistency of loudness perception, auditory threshold can be set in accordance with the design of the frequency curve of the sub-band output signal of the threshold value. Secondly, according to the masking characteristics and the actual quantified threshold, the weak sub-band signal energy can be removed.Then, for endpoint detection, the enhanced speech signals use threshold endpoint detection and band variance detection. Experiments show that method of improving the method of wavelet threshold to denoise can not detect the start and end points of the pronunciation of each word. Although this method of voice enhanced makes the loss of certain speech features, it effectively removes some of the noise squares. Although the other two methods have a certain degree of noise reduction effect , the effect of endpoint detection is not very good.Then, the paper extracts the characteristics of speech signal parameters. It selects Mel Frequeney Cepstrum Coeffieient (MFCC) and finally uses dynamic time warping algorithm for computing the results of pattern recognition. It gives the example of voice'0'with different snr values (10,5,1) , using three methods which are made in the paper to enhance the speech signals ,the paper calculates template'0'dtw value. From the experimental results, the paper finds the marked improvement to match the effect. |