Font Size: a A A

Computational Auditory Scene Analysis Based Voice Pretreatment System

Posted on:2014-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2298330422490582Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of modern communication technology, the kind of noise andinterference becomes more and more complex. The speech signals are often influencedby the environmental noise, which results in a sharp decline in voice quality. Theconventional speech recognition systems, such as IBM ViaVioce, have a goodrecognition performance under clean speech signals or high signal-to-noise ratio (SNR)scenarios. Unfortunately, in many cases, due to the noise and interference in a mobileenvironment, SNR may be lower than the detection threshold. This will lead toperformance degradation in these recognition systems. Therefore, how to improve theanti-jam capabilities of the speech recognition system under mobile environmentbecomes an urgent problem to be solved.Most of the existing speech recognition techniques are based on pattern recognitionbut regardless of the noise reduction process of speech signals. To cope with this issue,we devise a new speech recognition system that is based on the computational auditoryscene analysis (CASA). Unlike the conventional methods, the proposed system adds aCASA speech preprocessing module before the speech recognition engine, thus, it canimprove the speech recognition accuracy in mobile environment. By utilizing theinteroperability channel correlation and time-domain continuity properties, all the audioelements that corresponding to a same source can be merged into one fragment so as toseparate the speech signal of interest. Furthermore, we utilize the hidden Markov modeltoolkit (HTK) to build a Chinese speech database, and employ the endpoint detectionalgorithm to extract the Mel frequency Cepstral coefficients (MFCC) feature. In the end,we perform grammar training via combining revaluation algorithm and the MFCCfeatures. Then use the hidden Markov model (HMM) to build the CASA-based speechrecognition system.Simulations are taken to verify the effectiveness of the proposed system. In thesimulation, we use two types of noise, that is, the road noise and indoor cafénoise. Wetest the robustness of the proposed scheme under different SNR environment. From thesimulation results, we find that our approach is more robust than the conventionalmethods in terms of identification accuracy, especially under low SNRs.
Keywords/Search Tags:speech processing, speech recognition, speech separation auditory sceneanalysis, HTK
PDF Full Text Request
Related items