Noise-robust Auditory Feature Extraction And Optimization For Speech Recognition

Posted on:2020-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Shi

Full Text:PDF

GTID:2428330596985789

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

As the material shell and acoustic representation of language,speech is one of the most easily accessible information carriers for human beings.It carries and conveys various information sources and is one of the research contents in the field of human-computer interaction and intelligent communication.Speech recognition is one of the technologies for realizing intelligent human-computer interaction with broad application prospects and value,its main purpose is to communicate with the computer,so that the computer can convert the speech signal into the understandable commands.The complete speech recognition system includes feature extraction and pattern recognition.As an important part of speech recognition,feature extraction has a great impact on the performance of the system.The ideal feature parameters should have high robustness in complex environments,and how to extract the effective feature parameters that can fully characterize its semantic information,weaken the speaker's personality,easy to classify and have stability,and further improve the recognition rate is the key to speech recognition.Based on the research status and background significance of speechrecognition,on the basis of previous studies,this paper introduces the research trends of speech recognition and speech feature parameters in detail.In view of the incomprehensiveness of the current feature parameters in characterizing semantic information and the decline of recognition performance in noisy environment,starting from the three aspects of feature extraction,feature fusion and feature optimization,the different aspects of the speech recognition system are deeply studied and experimentally verified.The main research work the paper are as follows:(1)The composition of speech recognition system was overviewed.Firstly,three digital model of speech signal was introduced.Secondly,the basic principle and classification of speech recognition and performance evaluation index were elaborated in detail.Then,the three modules of speech recognition system were carryed out: the preprocessing process of speech signal and its operation details,principle of common feature parameters extraction and main models of recognition.Finally,the speech recognition technology was summarized.(2)Based on the extraction process of Cochlear Filter Cepstral Coefficients(CFCC),and extracted the CFCCIF feature combined with the instantaneous frequency information.A New Cochlear Filter Cepstral Coefficients(NCFCC)was extracted by the power-law nonlinear function which can simulate the auditory characteristics of human ear,and the effects of different nonlinear transformation processes on the performanceof CFCC were deeply studied.The robust performance of NCFCC feature was validated by different recognition effects under different SNR environments in the same speech database.(3)Aiming at the performance degradation of speech recognition system in noisy environment,based on the above NCFCC feature,the enhancement technology was applied in the front-end processing of speech signal,which combined speech enhancement with feature extraction.Different speech enhancement methods were used in the front-end of feature extraction.Three new robust feature parameters were proposed: Fusion Feature Based on Power-law Nonlinearity Function and Spectral Subtraction(FFPSS),Fusion Feature Based on Power-law Nonlinearity Function and Recursive Least Square(FFPRLS)and Fusion Feature Based on Power-law Nonlinearity Function and Least Mean Square(FFPLMS).The validity of the combination of speech enhancement and feature extraction is verified,and the recognition rate of the speech recognition system based on the above three features is improved.(4)From the perspective of speech enhancement,the energy tracking transformation characteristics of noisy speech were analyzed,and then Teager Energy Operators Cepstral Coefficients(TEOCC)was extracted.The single type features are not enough to characterize the complete characteristics of speech signals,Firstly,the optimization effect of thedynamic and static combination features on a single static feature is verified by designing experiment.Then combines the energy feature TEOCC to form a fusion feature set,it is verified that the energy feature TEOCC can compensate the human ear auditory cepstral feature,and confirm the fusion feature set can effectively improve the robust performance of the recognition network.(5)Aiming at the problems of large amount of data and high computational complexity of the fusion feature set,a feature optimization method based on principal component analysis(PCA)was proposed.Firstly,the feasibility of this method is validated by designing the feature optimization pre-experiment based on dynamic and static combination feature.The feature set optimization and comparison experiment is performed on the fusion feature set with energy feature,and the optimized speech feature parameter set is obtained.Finally,the design comparison experiment based on the optimized feature set,it is verified that the feature set can further improve the performance of the speech recognition system.Furthermore,the feasibility and effectiveness of the method are proved.

Keywords/Search Tags:

speech recognition, power-law nonlinearity function, cochlear filter cepstral coefficients, speech enhancement method, teager energy operators cepstral coefficients, feature optimization

PDF Full Text Request

Related items

1	Cochlear Filter Cepstral Feature In Speech Recognition
2	Study Of Methods Of Speech Features Extraction Of Ando Tibetan
3	Estimation of cepstral coefficients for robust speech recognition
4	Study On Deep Learning-Based Speech Quality Assessment
5	Hidden Markov Model Based Automatic Speech Recognition Using Mel Frequency Cepstral Coefficients In Nepalese
6	Anti-noise Power Normalized Cepstral Coefficients For Two-level Robust Environmental Sounds Recognition In Real Noisy Conditions
7	Research For Algorithm Of Speech Recognition Based On WD/HMM
8	Research On Continuous Speech Recognition Technology Based On HMM
9	Comprehensive Analysis And Application Of Template Matching Algorithm Based On Feature Extraction Of Speech Signal
10	Speech Recognition Speed Up Research Based On MFCC