The Research Of Lip Synchronization And Expression Control In Uyghur Visual Speech Synthesis

Posted on:2015-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:L Cao

Full Text:PDF

GTID:2298330431991876

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Lip synchronization and expression intensity control are two important issues whichshould be taken in account seriously in visual speech synthesis field. For these problems,considering the characteristics of the Uyghur, the paper proposed a visual speech synthesisframework which is able to ensure lip synchronization and expression intensity control. And aprototype system which is compatible with MPEG-4standard is generated based on theframework.This paper describes the methods of building Uygur audio-video (AV) data collectionfirstly and afterwards makes an analysis and processing on these data. A detailed descriptionof the calculation methods of the acoustic characteristics and differences in the aspects ofdistinguishing emotion is given, and the feature vectors for emotion recognition and phonemerecognition are determined finally. In order to ensure the AV data are consecutive andeffective, a posture correction method is proposed based on geometric transformation. On thebasis of posture correction, considering characteristics in lip height, width of thepronunciation of each phoneme in Uyghur, the paper establishes FAP configuration setcorresponding to every viseme class. After observing Uighur face video, the six typicalexpression (happiness, sadness, disgust, surprise, fear, anger) were divided into five levels,and for each level of each expression, a FAP configuration set are established.For realization of lip synchronization, the methods of the phoneme border division andphoneme recognition are combined. The paper achieves phoneme border division by the useof proposed inter-segment similarity model. Based on the division, multi-dimensionedMFCCs are extracted and Hidden Markov Model (HMM) is exploited to recognize phoneme,at last, key frame interpolation technique are used to achieve smooth transition. Owing to takeUygur phoneme duration and animation playback rate in account during the interpolation, theauthenticity of the synthesis results is guaranteed. For the realization of the expression intensity control, the paper proposes a "two-step"visual rhythm modulation model. First, artificial neural networks (ANN) is used to recognizebasic emotion involved in speech, and then the energy and pitch curve of entire sentence areextracted. And combining with FAP configuration corresponding to every expression, thepaper achieve expression intensity control. Finally, weighted sum of viseme FAP andexpression FAP is calculated to achieve expression intensity changing accompanying withenergy and pitch changing.Subjective and objective experiments show that resulting visual speech has a goodauthenticity, and can meet people’s need for AV in real life, and reach the synchronizationstandard proposed by ATSC. In the case of AV, average emotional recognition accuracy rate isup to80%, therefore effectively verifies the validity of the proposed methods in terms ofemotional expression, and lays the foundation for further research of Uyghur visual speechfield.

Keywords/Search Tags:

Uyghur, visual speech synthesis, lip synchronization, expression control, NN, HMM

PDF Full Text Request

Related items

1	Research And Implementation Of HMM Based Uyghur Speech Synthesis System
2	The Research And Realization Of Corpus Based Speech Synthesis System For Uyghur
3	The Key Issues In Uyghur Emotional Speech Synthesis System
4	Research On Key Technologies Of Database Construction In Uyghur TTS
5	A Study On Speech Synthesis And Visual Speech Synthesis Based On Neural Networks
6	Experimental Acoustic Feature Analysis Of Uyghur Plosives
7	Study On The Prosodic Levels And Their Automatic Labeling Method For Uyghur Tts System
8	The Study On Key Technologies Of Realistic Chinese Visual Speech Synthesis
9	Research On Automatic Labeling Of Speech Synthesis Corpora
10	Research On Chinese Speech Synthesis Based On Pitch Synchronization Superposition Method