Font Size: a A A

Research On Speech And Lipreading Human-Machine Interaction For Service Robot

Posted on:2010-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HeFull Text:PDF
GTID:1118360302988278Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
With the aim of realizing service robot for the disable and the elderly, speech and lipreading human-machine interaction on a intelligent wheelchair platform is our research object, And lipreading is research focus. Through the detail analysis on the key problems in this area, we give our answers to these questions, including face and ROI location algorithm, feature extraction algorithm, selection of pattern recognition model, fusion algorithm of speech and lipreading. All meaningful achievements have been verified on speaker dependent bimodal database. Finally, a real-time speech and lipreading human-machine interaction platform was set up, including software and hardware. The intelligent wheelchair is controlled by PC and microcontroller. The main achievements is as follows:To solve the problem of detecting face and locating ROI in lipreading, a novel adaptive ROI locating algorithm was proposed. It discard the illumination component in HSV color model, and solve the chrominance difference of skin and lip between different people based on adaptive algorithm. Detecting face and locating lip are accomplished at the same time. To enhance the robustness, the algorithm was verified on Freret face database which covers different race people. and the experimental results showed that this algorithm is superior to other current similar algorithm.Having analyzed different feature extraction algorithm, we proposed a feature extraction algorithm: Linear Discriminant Analysis based on Object (LDAO). In speech or speechreading recognition application, Linear Discriminant Analysis (LDA) algorithm usually choose syllable, HMM state or other units as class unit. but the feature dimensionality reduction direction based on this traditional LDA have no direct relations with recognition accuracy. To this problem, An improved LDA algorithm LDAO which is fit for isolated words recognition in speechreading is proposed. LDAO choose the objects to be recognized as class unit to linear discriminant analysis, which guarantees feature extraction follow the most discriminant directions among objects in theory. Subsequently, training and recognizing method for LDAO was also given. All experiments were performed on bimodal database, Experimental results showed that this algorithm is superior to any other appearance-based feature extraction algorithm in speechreading.Considering the disadvantages of HMM and ANN in pattern recognition, SVM was proposed to act as the lipreading recognition model in this paper. SVM is based on structural risk minimization, it's good at handling pattern recognition under small sample. on the other hand, it doesn't have the unreasonable hypothesis in HMM. It's the best pattern recognition model under small sample situation in theory. But it require the fixed feature dimensionality in application. And then several feature normalization algorithms were tested and compared in experiments. Experiments on small sample showed that SVM perform better than HMM.To fuse speech and lipreading effectively, considering the asynchronism between speech and lipreading, a medium-term fusion strategy based on coupled HMM was proposed. The coupled HMM fusion strategy not only take the time correlation in consideration but also solve the asynchromism problem. To reduce the computation time, the model is simplified by limiting only one step asynchromism between two channels. furthermore, the coupled HMM is replaced by a equivalent two stream HMM, so the coupled HMM can be trained and recognized with the traditional HMM algorithm. At last, experiments on bimodal database showed that coupled HMM fusion strategy is more better than synchronism fusion strategy.At last, We set up a real time speech and lipreading human-machine interaction platform for the first time in China. including software and hardware design. As speech and lipreading is rather time-consuming, a kind of distributing control system was designed, the upper is personal computer, which is responsible for getting speech information, lipreading pictures and computation, the lower is microcontroller which control the wheelchair. A complete software program and some important hardware interface circuits were designed. which made a fundamental platform for the future research in this area.
Keywords/Search Tags:human-machine interaction, lipreading, speech, feature extraction, information fusion
PDF Full Text Request
Related items