Font Size: a A A

Voice Signal Front-end Processing Technology Research

Posted on:2006-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:2208360152482395Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The thesis mainly discusses endpoint detection and speech enhancement of noisy speech signal with low SNR. Endpoint detection and speech enhancement are both preprocessing of speech signal, and the processing accuracy of these two parts has essential influence on the subsequent processing such as speech coding and speech recognition. Effective endpoint detection can greatly reduce the processing time and also avoid the interference of noise from silence part in speech. Speech enhancement tries to extract clean speech signal from original one with noise and improve the SNR of speech signal.Concerning the endpoint detection and speech enhancement of noisy speech signal with low SNR, the contribution of the thesis is summarized as follows:1. Endpoint detection approach based on short energy and averaging short zero-crossing ratio is researched and improved on the decision of energy threshold, replacing minimum of energy of silence part with mean value of the energy. In addition, the detection effect of the improved approach is experimented under different noisy backgrounds.2. Endpoint detection approach based on frequency band variance is also discussed. It extracts beginning point and ending point of speech according to the difference of speech part and noisy part in spectrum. Besides, discontinuity of frequency band variance in the frames of silence part is removed in order to avoid the impulse interference because of microphone shaking.3. Endpoint detection approach based on information entropy is analyzed. It detects beginning point and ending point of speech on the basis of the truth that the information entropy of speech part is higher than that of silence part. How to decide the threshold and remove the discontinuity of entropy in the frames of silence part are also discussed.4. The above three methods are tested with speech of twenty numbers, alphabets and some sentences in Chinese under backgrounds without noise, with white noise of different SNR and with Babble noise. Results show that the approach based on short energy and averaging short zero-crossing ratio and information entropy perform better than the approach based on frequency band variance under background without noise and that the approach based on frequency band variance and especially the approach based on information entropy perform much better than the approach based on short energy and averaging short zero-crossing ratio.5. Two improving methods concerning spectral subtraction are proposed aiming at the problem of ignoring the time-varying characteristic of noisy signal and neglecting noise in different frequency channel. The coefficient of every frame in spectral subtraction is dynamically adjusted according to the stability of noise in the frame or according to the auditory masking threshold. The two improving approaches can approximate the real noise much more accurately and thus make the processing result of spectral subtraction is more close to clean speech signal.The comparing experiments of the two improving approaches with the original spectral subtraction show that the improving approaches can effectively get rid of music noise and increase the SNR greatly. At the same time, the improving approach based on the auditory masking threshold takes into consideration the auditory character of human ear so the processing result exhibits better intelligibility, which provides new idea tackling the contradictory of SNR and intelligibility.
Keywords/Search Tags:Endpoint Detection, Frequency Band Variance, Information Entropy Function, Speech Enhancement, Spectral Subtraction, Music Noise, Masking Model of Auditory System
PDF Full Text Request
Related items