Font Size: a A A

Research On The Acoustic Modeling Of Expression Speech

Posted on:2012-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2218330341450401Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech is the most natural, efficient and convenient way of communication as the expression form of language's voice. Human speech includes not only the information of linguistics, but the nonverbal message of human's sentiment and emotion, so the study of speech synthesis which carries the expressivity ability of sentiment has become a hot research in speech synthesis. The expressivity ability of affective speech is a research topic with highly theoretical and applied value. This dissertation transformed neutral speech data and ten types of emotion speech data, 3-D PAD emotion model was introduced to tag the expressivity ability, TBL algorithm was used to predict the prosodic boundary, and then established a model of fundamental frequency curves about every bytes by using five-degree tone. Based on the above mentioned work, we established a prosodic phrase prediction model which can which can transform the neutral emotion into others based on generalized regression neural network (GRNN). Main achievements and originalities are as follow:1. The paper proposed a simplified PAD table to tag the expressivity ability. Experiment results showed that simplified PAD table can save time in tagging when tagging and testing the speech expressivity ability and can improve the consistency of the test results.2. The paper proposed a new prediction feature of prosody structure. According to the relationships between prosodic phrase and syntax structure, the height of syntax tree was taken as the forecast feature. Therefore the prosodic phrase prediction was employed based on the TBL model.3. The paper established five-degree tone model of expressive syllables by using polynomial regression, and then an analysis and comparison of the speech difference between neutral speech and emotion speech was made.4. The paper proposed a modeling method, the generalized regression neural network (GRNN), to achieve the transformation from neural speech to emotional speech. The context parameters of each emotional speech was taken as input, the parameters of F0 contour model,The length of time,and the pause duration were taken as the output. The modeling method can predict the acoustic features of speech by the context parameters and tagged emotional PAD values. The transformation between neutral speech and emotion speech based on the STRAIGHT algorithm is also achieved. The emotional mean opinion score (EMOS) results demonstrated that the average score of the ten types of modified speech was 4.0 and it had the ability to express complicated emotion.
Keywords/Search Tags:Emotional speech, Prosodic phrase, F0 contour, Transformation-based Error-driven Learning(TBL), Generalized regression neural network (GRNN)
PDF Full Text Request
Related items