Font Size: a A A

Research On Dual-modal Anti-noise Feature Extraction Of Fuzzy Speech

Posted on:2022-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X J FengFull Text:PDF
GTID:2518306542980839Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The speech recognition technology ushered in the upsurge of development again under the promotion of artificial intelligence.People urgently hope that in real life,they can also have better communication with intelligent machines,so that the machine can understand people's language and complete the correct response operation in accordance with the instructions issued by people.But for now,there are technical challenges to getting speech recognition out of the lab and into life.In the environment where there is no noise or the noise is very small,the recognition effect of speech recognition system is good and the recognition rate is high.However,when the background noise is very large or the recognition environment is more complex,the performance of the recognition system is not as ideal as that in the quiet environment of the laboratory.Therefore,it is of great significance to study the robust anti-noise performance of speech recognition system in complex environment.There are fuzzy voices in Mandarin Chinese that are similar in pronunciation mechanism,easy to be confused in hearing,and easily recognized by intelligent machines.The performance of the voice recognition system depends largely on the selection of voice feature parameters,so this article focuses on the pronunciation mechanism and auditory characteristics.Starting from two aspects,the anti-noise characteristics of fuzzy speech are analyzed and researched.1.With the help of the research group,the three-dimensional electromagnetic pronunciation instrument produced in Germany was used to collect the movement trajectory information of the vocal organs.At the same time,professional recording equipment was used to synchronously record the audio signal of the voice,and the two-mode fuzzy voice data meeting the research requirements was established.Set,which includes two modes of speech audio signal and articulator movement signal.Use this data set as an experimental sample for feature extraction research.2.The speech recognition system selected in the experiment is introduced,the operation process framework of the recognition system is described,and the two key steps of feature extraction and recognition classification network are analyzed emphatically.Then four methods of prosodic feature extraction are introduced,including pitch frequency,short-term mean energy,short-term mean amplitude and formant.Three kinds of classification and recognition networks including Artificial Neural Network(ANN),Random Forest(RF)and Support Vector Machine(SVM)are compared and analyzed.According to the size of the self-built data set,the SVM is chosen as the classification network of the recognition system3.Starting from the acoustic information,based on the Cochlear Filter Cepstral Coefficients(CFCC),combined with different nonlinear transformations to improve the CFCC.Since the speech signal is a non-stationary time-varying signal,the short-time Fourier transform and wavelet transform are compared to deal with non-stationary signals,the S transform is introduced to perform time-frequency conversion on the speech signal,and singular value decomposition is used for reference.(SVD)to suppress bandwidth random noise and extract new acoustic characteristic parameters(S-Transformation Cepstrum Features of Cochlear Filter,ST-CFCC)4.Starting from the pronunciation mechanism of speech,analyze the movement trajectory of the speech organs,and select the tongue and mandible to extract the pronunciation movement characteristics.At the same time,articulatory cepstral coefficients(ACCs)are also proposed as motion features,which are the cepstral coefficients of time-positioned pronunciation signals.Compare and analyze the feasibility and classification results of different sports features.5.Starting from the characteristic layer,carry out the dual-modal feature fusion research of acoustic features and kinematics features.Combined use of kernel principal component analysis and linear canonical correlation analysis to complete the dimensionality reduction of modal features and cross-modal feature fusion.The fusion feature vector design comparison experiment further verifies that the dual-modal fusion feature is compared with the single-mode feature.The advantages of state features and the effectiveness of the fusion method improve the performance of the speech recognition system.
Keywords/Search Tags:fuzzy speech recognition, acoustic feature, articulatory movement features, s transform, dual-modal feature fusion
PDF Full Text Request
Related items