Font Size: a A A

The Research Of Lip Synchronization And Expression Control In Uyghur Visual Speech Synthesis

Posted on:2015-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:L CaoFull Text:PDF
GTID:2298330431991876Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Lip synchronization and expression intensity control are two important issues whichshould be taken in account seriously in visual speech synthesis field. For these problems,considering the characteristics of the Uyghur, the paper proposed a visual speech synthesisframework which is able to ensure lip synchronization and expression intensity control. And aprototype system which is compatible with MPEG-4standard is generated based on theframework.This paper describes the methods of building Uygur audio-video (AV) data collectionfirstly and afterwards makes an analysis and processing on these data. A detailed descriptionof the calculation methods of the acoustic characteristics and differences in the aspects ofdistinguishing emotion is given, and the feature vectors for emotion recognition and phonemerecognition are determined finally. In order to ensure the AV data are consecutive andeffective, a posture correction method is proposed based on geometric transformation. On thebasis of posture correction, considering characteristics in lip height, width of thepronunciation of each phoneme in Uyghur, the paper establishes FAP configuration setcorresponding to every viseme class. After observing Uighur face video, the six typicalexpression (happiness, sadness, disgust, surprise, fear, anger) were divided into five levels,and for each level of each expression, a FAP configuration set are established.For realization of lip synchronization, the methods of the phoneme border division andphoneme recognition are combined. The paper achieves phoneme border division by the useof proposed inter-segment similarity model. Based on the division, multi-dimensionedMFCCs are extracted and Hidden Markov Model (HMM) is exploited to recognize phoneme,at last, key frame interpolation technique are used to achieve smooth transition. Owing to takeUygur phoneme duration and animation playback rate in account during the interpolation, theauthenticity of the synthesis results is guaranteed. For the realization of the expression intensity control, the paper proposes a "two-step"visual rhythm modulation model. First, artificial neural networks (ANN) is used to recognizebasic emotion involved in speech, and then the energy and pitch curve of entire sentence areextracted. And combining with FAP configuration corresponding to every expression, thepaper achieve expression intensity control. Finally, weighted sum of viseme FAP andexpression FAP is calculated to achieve expression intensity changing accompanying withenergy and pitch changing.Subjective and objective experiments show that resulting visual speech has a goodauthenticity, and can meet people’s need for AV in real life, and reach the synchronizationstandard proposed by ATSC. In the case of AV, average emotional recognition accuracy rate isup to80%, therefore effectively verifies the validity of the proposed methods in terms ofemotional expression, and lays the foundation for further research of Uyghur visual speechfield.
Keywords/Search Tags:Uyghur, visual speech synthesis, lip synchronization, expression control, NN, HMM
PDF Full Text Request
Related items