Curenttly,the research of robust speech recognition is focused on resolving the mismatch between test environment and training environment,such as robust speech feature extraction,feature compensation and pattern adaptive method.These robust methods are to ensure the consistency of the test feature parameter model and the training feature parameter model.The related research shows that the hearing system of human ear has ability of robust speech processing in the declined acoustic environment.The psychoacoustic research indicates that the perception process of auditory system can be divided into two stages:first,the segmentation of the speech signal,and the second,the organization of the individual components belonging to the same source.Thus each source has a continuous stream data.That is,the perceptual process of the auditory system is actually the reorganization process of the different sources in the auditory scene,and the components of the mixed sound signal belonging to the same sound source are organized into a data stream to obtain the data stream of different sound sources.On this basis,further processing is carried out.Therefore,from the mechanism of human ear signal processing,we study the robust identification method based on speech separation.Considering that the speech separation based on spatial information is independent of the speech signal content and speaker.Meanwhile,it is not necessary to establish the statistical model of the source signal parameters based on the speech separation of the spatial information.After space separation,there exists the problem of missing data in characteristic parameters.In this paper,the speech separation method based on spatial separation and the speech recognition method based on missing data are combined,and the isolated word recognition algorithm of fusion space azimuth separation and missing data is proposed in this paper.The main work of this thesis is as follows:(1)The basic principles of speech recognition system are studied,including preprocessing,feature extraction and HMM speech model.This thesis analyzes the representative technology of present robust speech recognition technology in signal space,feature space and model space.And also the missing data technology of speech recognition is introduced.(2)The speech feature parameters of the present robust recognition system are analyzed,including the Mel frequency cepstral coefficient(MFCC)and the linear predictive cepstral coefficient(LPCC)in the cepstrum domain.Based on the principles of missing data technology,this thesis analyzes the other two parameters in frequency domains:FBANK parameters based on the Mel filter banks and subband RateMap parameters based on the Gammatone filter banks.Based on HMM,the simulation results show that those two parameters can be used in speech recognition based on missing data.(3)The speech recognition is proposed based on spatial separation and missing data technology.The speech separation algorithm based on spatial information generates the binary mask of different sound sources based on the sparsity of the speech signal.This decision mode will cause the loss of the target sound source component.In this thesis,we study two algorithms to process the missing data.One is to completely ignore the missing part of the data,only present data is used for speech recognition,that is,marginalization.The other is to restore the missing part of the data through a specific method,and to obtain complete data for speech recognition,that is,data imputation.The simulation results show that the performance of the two techniques is significantly improved at low SNR. |