Font Size: a A A

The Key Techniques Of Automatic Speech Recognition For The Embedded Computing Platform

Posted on:2011-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:1228330395985349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the next few decades, the advances in communications will radically change the way we live and work. The dream that people can use the equipment in a certain scope, at any time and at any place, to carry through the real-time communication and data processing will gradually become the reality. Automatic speech recognition(ASR) based on the embedded computing platform will be one of the key technique.Many ASR systems which have an outstanding performance in laboratory environment, once applied in the complex actual noise environment become unstable immediately; On the other hand, high robustness recognition systems often leads to high computation load, which makes these systems only suit for the PC platform or high performance servers. How to reduce the complexity of ASR system to make it suitable for the embedded platform, and improve the robustness under complex noise environment at the same time, are the key points and difficulties in the embedded ASR research field. At present, most embedded ASR application systems are the distributional structure, that is, the front end of the speech recognition is loaded on the target equipment, and the complex rear end of the speech recognition is left in the server to handle. This dissertation is concerned about the key techniques in the front end of automatic speech recognition for the embedded computing platform.As the ASR’s first step, an effective speech endpoint detection can reduce the system process time in the following steps, eliminate the noise interference coming from the unvoiced speech, and increase the accuracy of the speech recognition. This dissertation proposed two kinds novel method of speech endpoint detection. First, combining the logarithmic energy characteristic in the time domain with the spectrum entropy characteristic in the frequency domain, proposed an endpoint detection method based on the logarithmic energy spectrum entropy. Due to its low complexity, this method can be applied in the low or middle embedded platform; Second, in view of non-linear speech characteristics endpoint detection has excellent performance in suppressing noise but a little bit more high complexity, we proposed a new endpoint detection method based on the sample entropy, which can be hopefully applied in the high-end embedded platform. The simulation experiments show that, under the low signal-to-noise ratio(SNR) environment, the two novel methods have a better performance in robustness and speech/noise discrimination and have a higher accuracy in endpoint detection, compared with the traditional energy method, the spectral entropy method, the energy spectral entropy, the logarithmic energy method and so on.Speech enhancement is the procedure of eliminating noise in the noisy signal as far as possible and getting the relative clean signal. Fully noise-free is impossible. The goal of actual speech enhancement is to suppress the background noise for one thing, and protect or improve the perceived speech quality at the same time. Speech enhancement algorithm based on the short-time spectrum estimation is quite suitable for the embedded platform owing to its low complexity, but sometimes it will cause speech distortion. After analyzing the complexity of several typical short-time spectrum estimating algorithms, we found that the RL algorithm has the smallest add and multiply computation. So we further improved the RL algorithm with masking effect, and finally proposed the improved RL algorithm based on the Bark domain, moreover, we reduced the computation complexity. Experiments show that the improved algorithm suppressed the noise significantly, achieved a better speech quality, and reduced the speech distortion effectively.Speech feature extraction is the lastest and most important step of ASR’s front end processing, it plays a decisive role in speech recognition. Due to good performance, Mel-Frequency Cepstral Coefficient(MFCC) becomes the standard front end of the ASR system. This dissertation has made two improvements in term of standard MFCC feature extraction:First, adjusted the coefficients of hamming window, to improve the window function performance; Second, added the Subband Spectrum Centroid(SSC) to the MFCC process. The traditional speech feature extraction considered the speech amplitude information but neglects the frequency spectrum information. Because the frequency spectrum peak position of every frequency band is fewer influenced by the background noise, it has a better robustness. And the SSC is very close to frequency spectrum’s peak value position, therefore, we added SCC into MFCC, proposed a new speech feature extraction method called Mel Subband Spectrum Centroid(MSSC). The HTK simulation experiments show that, under low SNR environment, the new hamming window together with the MSSC speech feature extraction method increase the recognition rate by17.13%equally, compared with the traditional MFCC method.Finally, we integrated the above mentioned speech endpoint detection algorithm, speech enhancement algorithm and speech feature extraction algorithm into an ASR front end experiment system. We chose ADI Corporation’s high performance multimedia digital signal processor ADSP-BF533as embedded platform, optimized this front-end system and transplanted it to ADSP-BF533successfully, which has verify the feasibility and reality of apply it in the embedded platform. At last, we applied some of these above research results in the design of a mobile learning platform prototypical system. Moreover, we have further studied knowledge representation technique of teaching resources, proposed a new ontology concept similarity calculation method, we have also studied the AMR-WB encoder optimization technique, and proposed a fast fixed codebook search method. This prototype system has been used in a series of products of Zhongshan Readboy Company, such as children early teaching machine, student PDA and digital reading machine, bring huge economic benefits. Product development proved that the research results of this dissertation can be applied in the embedded system widely.
Keywords/Search Tags:Embedded Computing, Mobile Learning Platform, AutomaticSpeech Recognition, Speech Endpoint Detection, SpeechEnhancement, Speech Feature Extraction
PDF Full Text Request
Related items