Font Size: a A A

Design Of Speech Control System On Av Bimodal Information And Its Realization

Posted on:2011-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:L P YanFull Text:PDF
GTID:2178360308964642Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Voice control used in the automotive environment can liberate the driver's hands and eyes, and improve driving safety and pleasure. However, weak Audio- only speech recognition technology in noisy environment restricts the development of automotive voice control. There is another kind of Automatic speech recognition (ASR), which uses an video sequence of the speakers lips, called visual speech recognition (speech reading or lip-reading). Visual speech can improve the robustness of recognition system under noise environment. The application of audio-visual speech recognition in vehicular become better, because the driver's position fixed and it's easier to get the visual feature. Nowadays, audio-visual speech recognition for voice control in vehicular become an important research topic. In order to expedite the study process, an audio-visual speech recognition simulation system is built for voice control in vehicular on PC. This simulation system provides reference for embedded speech control systems in vehicle. The main works in this thesis as follow:1) The basic knowledge of audio-visual speech recognition is studied, and the design is proposed for audio-visual vehicle control simulation system. Mel Frequency Cepstral Coefficients (MFCC) is used as the audio-only feature, which approximates the human auditory system's response and be robust in the presence of additive noise. Hidden Markov Model (HMM) is used as the acoustic model. The image pixel-based features in the mouth area are considered as visual-only features. Feature fusion and Decision fusion are discussed for audio-visual speech recognition.2) Bimodal Speech Recognition for vehicular Control database (BiMoSp) is collected. The rule of how to build an bimodal speech database is summed, according to the current audio-visual speech database in home and abroad. All the data in BiMoSp are labeled. An labeling soft is also designed to label the data, which reduce the amount of labeling work.3) Bimodal speech vehicular control system (BSVCS) is designed and carried out. There are three sub-systems in the BSVCS: model training,online recognition and offline recognition . These sub-systems have relation with each other on struct, but are independent on functions. These sub-systems are composed of many models. Some models can used in the two or above sun-systems. Model training sub-system includes audio training and visual training model. The output of model training sub-system will be used in online recognition and offline recognition sub-system. Application Toolkit for HTK(ATK) is used to do the audio-visual signal processing under Visual C\C++ program. Dynamic link library method is used in visual signal processing model in order to improve the algorithm. Offline recognition includes statistical function model which can show the result directly and intuitively. In online recognition, there are human-computer interaction processing flow model, results normalized model and optional processing model. These models are designed to show the good performance of the human-computer interaction and reduce the disturbing of audio noisy.4) The performance of the simulation system are tested in different environment. And the results of experiments are discussed. Experiments show that compare with traditional audio-only speech recognition, there is an great improvement on the audio-visual speech recognition. Audio-visual speech recognition is more useful for BSVCS.
Keywords/Search Tags:bimodal speech recognition, vehicular control, HTK, ATK, simulation system
PDF Full Text Request
Related items