With the appearances of information era based on digital techniques, peopleoften interact with kinds of machines more and more in order to receive, transactand transfer information. Today since computers are widely used, so that it isbecoming true that the natural communication between people and machineswithout using keyboard or mouse, which is the goal pursued by people for a longtime. As people have understood physiological mechanics and features of humanspeech signals, they expect and hope more and more to communicate withcomputers by speech instead of clicking mouse or typing keyboard. Thisman-machine communication is an important research problem.Multimedia era intensively requests speech recognition system to put intopractice from laboratory. Isolated word speech recognition system will bring someadvantage for people in daily life. Because of ambient noise, the product capabilityof isolated word speech recognition system is hard to attain a perfect demand.Isolated word speech recognition is the basis of other kinds of speech recognitiontechniques. Therefore it is significant and necessary to do research on robustspeech recognition of isolated word.Even the isolated word recognition systems are quite mature, there are lots ofproblems existed and the research are badly needed. This paper focuses on theproblems of isolated word speech recognition systemr as follows:(1). The problem of speech endpoint detection Studies showed that in a quietenvironment of a speech system utilizing an isolated word recognizer, morethan 50% of the error rate was credited to the endpoint detector.(2). The problem of speech recognition in noisy environment Existing ambientnoise extensivly leads the recognition rate to descend because it causesunmatched models of training and test.Endpoint detection, which aims at distinguishing speech and non-speechsegments from digital speech signal, is considered as one of the key preprocessingcomponents in automatic speech recognition (ASR) systems. The correct selectionof the beginning and ending of speech utterances can improve accuracy and speedof speech recognition systems. The essential characteristics of an ideal endpointdetector are: reliability, robustness, accuracy, adaptation, simplicity, etc. Of all ofthese characteristics, robustness in unfavorable conditions has been the mostdifficult task to accomplish. Therefore it is important to develop a robust endpointdetection algorithm for ASR. Many methods had been developed over the pastseveral decades. We can divide approximately these methods into two generalcategories. The first is comprised of algorithms based on thresholds. The secondembraces the algorithms based on pattern recognition techniques. The methodsbased on thresholds are simple so that they are used widely. In order to solve theproblem of speech endpoint detection in real world noisy environments, a newrobust feature intended for speech endpoint detection is proposed in this paper.Using circular average magnitude difference function in pitch period estimation, byintegrating with basic spectral entropy, the proposed feature can be used in variousnoisy environments and does not need any prior knowledge about noise.Simulation results show that the feature is effective to reduce the impact of somekinds of noise when the signal-to-noise ratio is very low. Via continuing to improvethe new feature, it is robust against Volvo car noise. It has high amplitude at thebegining and end of speech and can detect fricatives and plosives. So the thresholdis easy to set. Simulation results show that the new feature has high correct rate ofendpoint diction in Volov car noise.Ambient noise will make the original speech signal into noisy signal. Theperformances of many speech signal processing and recognitiong systems willdecline because that speech is corrupted by it. The mismatch between the trainingenvironment and test environment brought by noise will lead to decline therecognition capability. So the robust problem is the key to speech recognition.Based on HMM speech recognition system, the mismatch may be mapped intothree spaces as follows: signal space, feature space and model space.Corresponding to these spaces, antinoise speech recognition technique may divideinto there classes: speech enhancement, robust feature extraction and modelcompensation technic. My thesis researches the problems in signal space andfeature space.The essential aim of the arithmetic in the signal space is to eliminate noise bythe estimation of clean speech. In this space the main methods are speechenhancement algorithms. Speech enhancement technique is used to enhance speechquality and understandability degree generally with the basic idea of reducing noiseas much as possible. Hence relativly clean speech signal can be abtained. In orderto solve the problem of speech enhancement in impulsive noise, on the basis ofsignal subspace decomposition without speech model, a multi-input speechenhancement algorithm is presented. Using covariation concept in array signalprocessing, by eigen-decomposition of covariation coefficient matrix of speechsignal corrupted by impulsive noise, the proposed algorithm can get a clean speechsignal subspace. Simulation results show that the algorithm is effective to reducethe impact of impulse noise, Gauss white noise and Gauss color noise. Recognitionexperiments show that the enhanced speech signal can improve the recognition ratein low sigal-to-noise. In high SNR, the effect is not ideal due to the subjectivespeech enhancement evaluation or loss of certain information after the noisetreatment.The means of robust feature extraction are crucial in antinoise speechrecognition. Presently MFCC, PLP and MVDR features are widely used. Most ofrobust features are based on these features or combine them with to some othernoise removed methods, such as spectral subtraction, cepstral mean subtraction,RASTA filter, cepstral mean normalization, featuer vector normalization, etc.Traditionally, feature extraction is computed over full frequency band of speech.The major drawback of this approach is that even a partial band-limited noisecorruption will affect all feature vector components. Therefore multiband speechrecognition idea addresses this problem by performing acoustic feature analysisindependently for a set of frequency subbands. Combining the features fromfull-band with the subband features resolves the problem of losing correlativeinformation among subbands. AMFCC can be extracted by using one-sidedhigh-lag autocorrelation sequence. A linear lifter has no effect on continuousdensity HMM-based speech recognition. Cepatral features derived from differentialpower spectum (DPS) solved this problem considered as nonlinearly lifteredcepstral coefficients. This paper proposes a new mel-frequency cepstral featurebased on circle autocorrelation function and differential power spectrum. Themethod uses time domain relativity to restrain noise. The high lag part ofautocorrelation fucnction is not sensitive to noise, so all lower lag coefficients up to3ms are discarded. Fourier transform of the shor-time autocorrelation sequence issignal power spectrum. And the magnitude of DPS passes through a mel-frequencyfilter bank. With different noise under different SNR, speech recognition resultsshow that the recognition rate using new feature can approach that of the otherswhen SNR is high, and it is more than others in pink and babble noise at low SNR. |