Font Size: a A A

Optimization On VTS Feature Compensation For Speech Recognition

Posted on:2017-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:H J LiFull Text:PDF
GTID:2348330491962750Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The performance of speech recognition systems degrade rapidly in real applications due to environmental noise. Vector Taylor Series is a model-based feature compensation algorithm, which is robust to noise. VTS can effectively reduce the mismatch between the training and testing conditions so as to improve the recognition accuracy.This thesis is to optimize the structure of VTS based on feature compensation. The optimization includes two-layer GMM(Gaussian Mixture Model) VTS structure and the initial value optimization of the noise parameter estimation for multi-model environment VTS(MEVTS), which is aimed to improve the recognition speed and recognition performance of the speech recognition system.The main work of this thesis is summarized as follows:(1) Analysis of robust speech recognition system structure. This thesis focuses on key technologies of robust speech recognition, including weighted band spectral entropy based Voice Activity Detection(VAD), VTS based feature compensation, acoustic feature modles (GMM and HMM) for feature compensation and pattern recognition respectively.(2) The two-layer GMM based VTS optimization. Since the VTS dominates the computational complexity of speech recognition system, this thesis proposes a two-layer GMM VTS structure. The VTS is divided into noise parameter estimation stage and clean acoustic feature estimation stage. Two GMMs with different number of mixtures are utilized for each stage of VTS. In detail, the GMM with fewer number of mixtures is used for noise paremater estimatin stage, while the GMM with more mixtures is used for clean feature estimation. The simulation results show that the proposed structure significantly reduces the computational complexity of recognition system, while maintaining the performance.(3) Initial value optimization of the noise parameter for multi-environment VTS (MEVTS). MEVTS algorithm selects optimal acoustic feature model which matches the current environment best from the basic models. This method can effectively reduce the mismatch between the training environment and the testing environment. Also, the initial value of the noise parameter estimation is set according to the optimal GMM model can effectively avoid the expectation-maximization into the local convergence, and make EM algorithm converge to a more accurate estimate with less number of iterations, thus improve the speech recognition performance.(4) Relization of optimational VTS based speech recognition systems on MATLAB and C platform respectively. A large number of tests are executed to verify the effect of the proposed optimized VTS algorithm. Experiments show that the presented two-layer GMM structure optimization algorithm recognition speed by about 38% under the Chinese library, which the WER of speech recognition system is almost the same as the traditional VTS. Also, the initialization of MEVTS algorithm achives more accurate estimation of the noise parameters, and reduces the Word Error Rate (WER) of speech recnogition, especially in low SNR.
Keywords/Search Tags:vector Taylor series, feature compensation, Structure Optimization, multi-environmental model, initial value optimization
PDF Full Text Request
Related items