Font Size: a A A

The Study Of Feature Extraction And Acoustic Modeling In Speech Recognition System

Posted on:2013-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:M M ZhaoFull Text:PDF
GTID:2218330374961654Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
The basic process of Speech recognition including pretreatment,speechenhancement,speech denoising, kinds of Speech Segmentation and extraction, speechrecognition acoustic model, and language model. Based on the above process isintroduced, the feature extraction and acoustic model are studied as the main part in thispaper.This paper analyzes the general method of the speech enhancement and denoising,and discusses the ideas of establishing real-time online noise database to promotespeech enhancement and noise elimination; Introduces the method of endpoint detectionand it's important role in speech recognition. Detailed summarizes the variouscharacteristic parameters,and the methods that how to extract those parameters and it'simportant role in speech recognition. Analyses two extraction methods of improvedMel's Frequency Cepstrum Coefficients (MFCC). First one is called as Bark waveletMel's spectrum cepstrum coefficients (BMFCC). Bark wavelet transform is embeddedinto the MFCC parameters extraction process.Due to the rapid change and short-termstability of speech signal, this coefficients can reflect more information about thefrequency spectrum characteristics of speech signal than MFCC parameters. Anotherimprove method is called Critical Frequency Band and Wavelet TransformMFCC(WMFCC). The method extracts the parameters by using wavelet filter groupthat in accordance with the frequency of hearing to construct a new hearing filter groupwhich can bring more accurately reflect the ear hearing to replace the old one,comparedwith the old hearing filter group of MFCC, the new hearing Wavelet filter of criticalfrequency band can better reflect the work mechanism of cochlear of ear. Based on a lotof study of features parameters extraction,I put forward a novel characteristicparameters,named Linear Predictive Residual Phase Cepstrum Coefficients(RPCC), theRPCC's extracting process fuse the residual phase features and LPCC parameters withlinear superposition methos, improve the performance of Linear Predictive CepstrumCoefficients(LPCC) in reflecting the difference between different speech elements. The effectiveness of this parameters is proved in the experiment follows. This papermakes a detailed analysis of the various hidden markov model. Based on the basis studyof the existing acoustic model, I put forward a novel acoustic recognitionmodel,Nonhomogeneous Semi-continuous Hidden Markov Model(NSCHMM).Compared to the standard HMM, NSCHMM can not noly give the observed symbolmore accurate description, but also improves description of the state in markov chain.With great different of description for each observed symbol with gaussian distributioncomplete description in continuous HMM, NSCHMM use feature vector sharing way todescrib observed symbols, so this method can simplified model; And being differentform employing the geometric distribution to describ the hidden state's long distributionin homogeneous HMM, gaussian distribution is used to describe the long distribution ofhidden state in NSCHMM. Through the probability and statistics of speech material'slong distribution, we find that the long distribution of speech elements are not staygeometric distribution form, but more close to the gaussian distribution, uniformdistribution, etc. So NSCHMM which used gauss distribution to describe the longdistribution of the hidden state is more suitable for speech recognition. In subsequentexperiment of continuous speech recognition,the validity of this model is proved whencampared to homogeneous HMM model. Considering the exist birth defects of HMMmodel when is used on the confusable speech element, the end part of the paperdiscusses the two level decision-making speech recognition system based on theNSCHMM model and Support Vector Machine (SVM) model. And analyzes theapplication prospect of joint acoustic recognition model that based on NSCHMMmodel, confidence, the improved support vector machine model which improved byDynamic Time Warping(DTW), Directed acycline graph(DAG), andone-vs-rest(1VR)methord.
Keywords/Search Tags:Speech Recognition, Improved MFCC, RPCC, NonhomogeneousSemi-continuous Hidden Markov Model (NSCHMM)
PDF Full Text Request
Related items