Font Size: a A A

Research On 3D Visualization Of Speech

Posted on:2017-01-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:R LiFull Text:PDF
GTID:1108330485453636Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human communication is a multi-modal processing procedure. Besides the information conveyed by voice signals, the multi-modal processing also includes the visible information provided by facial expressions and human postures. Even though the clear auditory information can fulfill the task of human basic communication, the visible information can help to achieve an effect and vivid communication consequences. The cognitive psychology experiments proved that the mode of combining auditory signals with the visible information is more helpful than only having auditory signal. Effective lip reading, facial expression and gesture observation can help the hearing loss people to accurately understand the meaning of the speakers. Moreover, understanding the movements of lips, jaw, palate, and tongue can improve the language learners’ speech learning effects.Based on the human-machine speech interaction, the 3D visible speech research and related methods were studied in this dissertation. The research focus on theory, methods, and application of the 3D visible speech. The goal of this dissertation is to construct a 3D visible speech synthesized talking head system with synchronized speech and articulatory movements. Studying on this title enriched significantly not only the development of the areas of human-machine interaction and computer-aided speech interaction but also the improvement of methods for the articulatory modeling, animation, and accuracy evaluation in 3D visible speech research area.The main contents and highlights of this dissertation are summarized as follows:Firstly, the articulators in the oral cavity can be divided into partially visible and invisible articulators according to their physiological positions. In order to make the shape changing and the movement of each articulator visible, corresponding modeling and movement simulation methods are studied in this dissertation. The medical image techniques were used to acquire the shapes data of the articulators in the oral cavity. The 3D surface shape model of each articulator was constructed from the saggital and transverse Magnetic Resonance Imaging (MRI) planes with preprocessing before shape mesh generating. According to the different deformable features among articulators, we applied appropriate modeling methods. Specifically, for the rarely deformed articulators such as the teeth, the hard palate and the lower jaw, we only built their surface mesh models and regarded them as rigid bodies. But for the tongue and soft palate which has many shape changes were further modeled by mass-spring modeling method after getting their surface meshes. The articulatory movements are driven and controlled by Electromagnetic Articulography (EMA) data. According to the movements’characteristics, we gave the driven methods for each articulator. The proposed modeling methods are proved to be effective by generating articulatory animation for Chinese speech.Secondly, in order to avoid the happenings of the penetration problems during the simulation process, we proposed a collision handling method for the deformable articulators which have a series of complex movements and deformations. This kind of collisions can be further divided into deformable articulator vs. rigid articulator and deformable articulator vs. deformable articulator. The collision handling includes collision detection and collision response. The process of collision handling can be detailed as follows:Firstly, by calculating the intersection between the movement trace of each articulatory surface point with the rest articulatory surfaces’ meshes to judge whether the collision happened and where it happened. Secondly, in order to avoid the target articulator penetrates the rest articulators, we gave a fast collision response method. The performance of collisions handling of tongue and lips proved that our method is efficient and effective.Thirdly, articulatory movements’ accuracy evaluation is a difficult task in the 3D visualization speech studies. In this dissertation, we gave a detailed and comprehensive evaluation to the synthesized articulatory movements by taking objective and subjective evaluations. We proposed a shape comparision based accuracy evaluation methods for the objective evaluation. The contour shapes of the articulator side view extracted from the generated animation are compared with them labeled from the medical film. Comparing with the traditional used objective evaluation which is by calculating the movement error of the limited number of points on the surface of articulatory model with them in the EMA data, our method used the shapes information for the comparision in the medical film, but this shapes information can not be provided by EMA data. Experimental results show that our method provides a useful tool for the articulatory movements’ accuracy evaluation in 3D visible speech research area.Finally, we constructed a 3D visible speech synthesized talking head which can not only produce the speech synchronized facial animation, but also provide an intuitive observation of articulatory movement in the oral cavity to users. The system is developed by C++ and OpenGL programming languages, and the EMA data used here are collected by NDI Wave system. In order to have a friendly human-machine interaction interface, besides the articulatory movements in the oral cavity, the virtual talking head also includes the movements of the lips.
Keywords/Search Tags:speech visualization, articulatory modeling, articulatory movements simulation, medical image, human-machine interaction, speech synchronized animation
PDF Full Text Request
Related items