Font Size: a A A

Noise-robust Auditory Feature Extraction And Optimization For Speech Recognition

Posted on:2020-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ShiFull Text:PDF
GTID:2428330596985789Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As the material shell and acoustic representation of language,speech is one of the most easily accessible information carriers for human beings.It carries and conveys various information sources and is one of the research contents in the field of human-computer interaction and intelligent communication.Speech recognition is one of the technologies for realizing intelligent human-computer interaction with broad application prospects and value,its main purpose is to communicate with the computer,so that the computer can convert the speech signal into the understandable commands.The complete speech recognition system includes feature extraction and pattern recognition.As an important part of speech recognition,feature extraction has a great impact on the performance of the system.The ideal feature parameters should have high robustness in complex environments,and how to extract the effective feature parameters that can fully characterize its semantic information,weaken the speaker's personality,easy to classify and have stability,and further improve the recognition rate is the key to speech recognition.Based on the research status and background significance of speechrecognition,on the basis of previous studies,this paper introduces the research trends of speech recognition and speech feature parameters in detail.In view of the incomprehensiveness of the current feature parameters in characterizing semantic information and the decline of recognition performance in noisy environment,starting from the three aspects of feature extraction,feature fusion and feature optimization,the different aspects of the speech recognition system are deeply studied and experimentally verified.The main research work the paper are as follows:(1)The composition of speech recognition system was overviewed.Firstly,three digital model of speech signal was introduced.Secondly,the basic principle and classification of speech recognition and performance evaluation index were elaborated in detail.Then,the three modules of speech recognition system were carryed out: the preprocessing process of speech signal and its operation details,principle of common feature parameters extraction and main models of recognition.Finally,the speech recognition technology was summarized.(2)Based on the extraction process of Cochlear Filter Cepstral Coefficients(CFCC),and extracted the CFCCIF feature combined with the instantaneous frequency information.A New Cochlear Filter Cepstral Coefficients(NCFCC)was extracted by the power-law nonlinear function which can simulate the auditory characteristics of human ear,and the effects of different nonlinear transformation processes on the performanceof CFCC were deeply studied.The robust performance of NCFCC feature was validated by different recognition effects under different SNR environments in the same speech database.(3)Aiming at the performance degradation of speech recognition system in noisy environment,based on the above NCFCC feature,the enhancement technology was applied in the front-end processing of speech signal,which combined speech enhancement with feature extraction.Different speech enhancement methods were used in the front-end of feature extraction.Three new robust feature parameters were proposed: Fusion Feature Based on Power-law Nonlinearity Function and Spectral Subtraction(FFPSS),Fusion Feature Based on Power-law Nonlinearity Function and Recursive Least Square(FFPRLS)and Fusion Feature Based on Power-law Nonlinearity Function and Least Mean Square(FFPLMS).The validity of the combination of speech enhancement and feature extraction is verified,and the recognition rate of the speech recognition system based on the above three features is improved.(4)From the perspective of speech enhancement,the energy tracking transformation characteristics of noisy speech were analyzed,and then Teager Energy Operators Cepstral Coefficients(TEOCC)was extracted.The single type features are not enough to characterize the complete characteristics of speech signals,Firstly,the optimization effect of thedynamic and static combination features on a single static feature is verified by designing experiment.Then combines the energy feature TEOCC to form a fusion feature set,it is verified that the energy feature TEOCC can compensate the human ear auditory cepstral feature,and confirm the fusion feature set can effectively improve the robust performance of the recognition network.(5)Aiming at the problems of large amount of data and high computational complexity of the fusion feature set,a feature optimization method based on principal component analysis(PCA)was proposed.Firstly,the feasibility of this method is validated by designing the feature optimization pre-experiment based on dynamic and static combination feature.The feature set optimization and comparison experiment is performed on the fusion feature set with energy feature,and the optimized speech feature parameter set is obtained.Finally,the design comparison experiment based on the optimized feature set,it is verified that the feature set can further improve the performance of the speech recognition system.Furthermore,the feasibility and effectiveness of the method are proved.
Keywords/Search Tags:speech recognition, power-law nonlinearity function, cochlear filter cepstral coefficients, speech enhancement method, teager energy operators cepstral coefficients, feature optimization
PDF Full Text Request
Related items