Font Size: a A A

Research On Acoustic Analysis And Prosody Modeling For Xian-Dialect

Posted on:2010-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:W T GuoFull Text:PDF
GTID:2178360278996883Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The synthesis of Dialect speech is an important research topic in the field of human-computer speech communication. To achieve the synthesized dialect speech, the prosodic model should be produced by the Text-to-Speech Synthesis system. The thesis studied the acoustic features of xi'an-dialect, and compared the differences between mandarin and xi'an-dialect. The prosodic model which can convert mandarin speech to xi'an-dialect is introduced. The results of the research have important value for speech theory and application in finding the relation between mandarin and xi'an-dialect as well as achieving the synthesis of dialect speech. Main achievements and originalities are as follow:First, by analyzing the phonology features between mandarin and xi'an-dialect, a large text corpus including 160 single syllables, 488 two-syllable words, 19 carrier sentences and 23 text sentences with different intonation was built for study using"Dialect Investigation Word Table", in which single syllables are composed of all of initial, final and four tone information by mean of arrangement in pairs and two-syllable words with twenty kinds of tone combinations as well as carrier sentences formed as"X say X is the X".Secondly, comparing and analyzing the acoustic features by mean of a home-grown text analysis tool , related acoustic parameters are computed statistically to analyze static and dynamic tone features of single syllables, two-syllable words, carrier sentences for mandarin and xi'an-dialect. Labeled information was also checked manually. All of acoustic parameters are in both analyzed from different points of view and obtained tone conversion rule for mapping mandarin to xi'an-dialect with experimental results.Thirdly, a novel dialect prosodic conversion method was proposed. According to the characteristic of xi'an dialect speech, choose the Pitch Target prosodic model to model f0 contours. Then modeling parameters are transformed linearly to get new f0 contours. At last, we can have changed dialect speech. The MOS subjective evaluation results demonstrated that tone1 achieved 4.1 for average MOS ,tone2 is 4.6,tone3 is 3.3 and tone4 got 4.5.Finally, a novel model, which was named five-degree tone value, was proposed to get conversed xi'an dialect. The method can transform five-degree value of any tone to voiced frequency with time changing. Get the f0 contours of xi'an dialect based on the model, then achieve xi'an dialect transformation through modification the f0 contours of mandarin. The dissertation founded single syllable model and two-syllable model to converse dialect of different tone, respectively. The result showed that the model outperformed the method based on Pitch Target with minimum 4.7 of MOS scores, minimum 72.5% of ABX in single syllable and minimum 70% of MOS scores,considered best result, and minimum 73% of ABX in two-syllable.
Keywords/Search Tags:xi'an-dialect, corpus, Pitch Target model, normalized tone model, prosody conversion
PDF Full Text Request
Related items