Font Size: a A A

Research On Key Technologies Of Digitizing Speech Signal Processing

Posted on:2022-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:S BaiFull Text:PDF
GTID:2518306557465164Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
In recent years,digital speech signal processing technology has been deeply studied and widely used.At present,the research direction mainly focuses on voice activity detection,speech noise reduction,speech recognition and speech synthesis,among which the voice activity detection technology as the front-end of speech signal processing system is very important.However,due to the complex background noise,imperfect detection algorithm and other factors,the accuracy of voice activity detection is not ideal,and the stability of speech signal processing system can not be well guaranteed.Therefore,improving the accuracy of voice activity detection is of great significance to improve the stability of speech signal processing system.This paper first reviews the traditional voice activity detection algorithms,and then puts forward an improved method to improve the accuracy of voice activity detection in low SNR environment.Then,based on Spectral Subtraction,a speech de-noising algorithm is studied in low SNR environment.Finally,an isolated word speech recognition system is designed with the combination of voice activity detection and speech de-noising.The main work of this paper is as follows:(1)A voice activity detection algorithm based on Mel Energy Ratio is proposed in low SNR environment.Traditional voice activity algorithms can not guarantee high accuracy in low SNR environment.To solve this problem,based on the analysis of the application of Mel-Frequency Cepstral Coefficients(MFCC)and short-term energy in voice activity detection,this paper proposes a voice activity detection method that adds the three-dimensional components before MFCC(MFCCa)and divides them with short-time energy(Mel Energy Ratio)as the speech feature parameter.Finally,fuzzy c-means clustering algorithm is used to determine the thresholds of double threshold method for voice activity detection adaptively.The experimental results show that the accuracy of this algorithm in different types of low SNR noise environment is about 30% higher than that of other traditional algorithms.In addition,this paper also applies neural network to voice activity detection,and designs a mixed voice activity detection model of neural network and speech feature parameters to compare the endpoint detection accuracy when using different speech feature parameters.The experimental results show that the accuracy of voice activity detection using Mel Energy Ratio as feature parameter is about 10% higher than that of other feature parameters,which further verifies the superiority of using Mel Energy Ratio as feature parameter.(2)Based on Spectral Subtraction,a speech de-noising algorithm is proposed in low SNR environment.At present,the more successful speech de-noising algorithms are Spectral Subtraction,Wiener filtering,LMS adaptive filtering and so on.Spectral subtraction is the most commonly used algorithm,however,it also has some problems,such as no segment misjudgment,inaccurate noise estimation and so on.This paper proposes a speech de-noising algorithm based on Spectral Subtraction.The algorithm estimates the noise accurately based on high accuracy voice activity detection,updates the background noise spectrum adaptively,and improves the de-noising effect.Experimental results show that the SNR of the speech signal after denoising by this method is 1 d B higher than that of the traditional spectral subtraction method.(3)A speech recognition system based on DTW is designed.Based on the efficient voice activity detection algorithm and speech speech de-noising algorithm,this paper improves the traditional Dynamic Time Warping(DTW)speech recognition algorithm and designs a new speech recognition system.The system not only improves the accuracy of speech recognition,but also improves the stability of the system in low SNR environment.
Keywords/Search Tags:VAD, Mel Energy Ratio, Speech noise reduction, Spectral Subtraction, Speech Recognition
PDF Full Text Request
Related items