Font Size: a A A

Study On The Voice Activity Detection Method In Low SNR Environment

Posted on:2020-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:L X XiaFull Text:PDF
GTID:2428330596477317Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The State Council issued the Notice on the Development Plan for the New Generation of Artificial Intelligence,which requires the establishment of a key generic technology system for the new generation of artificial intelligence,including natural language processing technology.As an important part of natural language processing technology,voice activity detection is of great significance.In view of the low accuracy of voice activity detection in low SNR environment,this paper proposes three new voice activity detection methods and designs experiments to verify their feasibility and superiority.Firstly,the speech signal preprocessing method,the traditional speech feature extraction method and the endpoint detection method are introduced,which provides a theoretical basis for the research.Then in the feature extraction part,three speech feature extraction methods are proposed:?1?The disadvantages and reasons of poor voice activity detection effect for using Spectrum Entropy?SE?and Mel Frequency Cepstral Coefficient?MFCC?are analyzed,and the influence of the first component of MFCC?MFCC0?on MFCC in speech signal processing is explored,and the viewpoint that MFCC0 has a certain degree of speech tracking ability is proposed.Then,the concept of Product of Spectral Entropy and MFCC0?PSEM?is proposed by weighting SE with MFCC0.Finally,the advantages of PSEM are proved by the feature extraction of speech signal and the comparison with the SE and MFCC.?2?For the feature extraction method based on Empirical Mode Decomposition?EMD?and Teager energy operator?EMD-TEO?,it is pointed out that the reason for the long time of its algorithm is a large number of EMD decomposition.The concept of Teager Energy Information Entropy?TEE?is proposed by introducing the Information Entropy,improving the method of probability calculation,reducing the algorithm complexity and improving the robustness.After extracting the TEE features of speech signals and comparing them with EMD-TEO,it is proved that TEE has better real-time performance and endpoint detection performance.?3?The influence of the traditional probabilistic calculation method of Permutation Entropy?PE?on the effect of voice activity detection was explored,and the disadvantage of mean value of subsequence was not considered in the probabilistic calculation of PE.The Weighted Permutation Entropy?WPE?was proposed as the extraction method of speech feature parameters,and the superiority of this method for voice activity detection was proved by simulation experiment.In endpoint detection part,this paper uses the Fuzzy C-Means Clustering?FCMC?algorithm and the Bayesian Information Criterion?BIC?to adaptively estimate high and low thresholds of traditional double-threshold method,which makes the double threshold method adaptive.Finally,three new voice activity detection methods are formed by combining the three characteristic parameters with this method.Using TIMIT speech library and NUST6032014 speech library design comparison experiment,the experimental results show that the three methods proposed in this paper have higher accuracy of endpoint detection compared with traditional voice activity detection methods in a low SNR environment.
Keywords/Search Tags:voice activity detection, Product of Spectral Entropy and MFCC0?PSEM?, Weighted Permutation Entropy(WPE), Teager Energy Information Entropy(TEE), double-threshold method
PDF Full Text Request
Related items