Speech is the most convenient way of communication. A great many dialects are adopted between man-man dialogs. Since dialect conversion can improve the harmony, diversity and efficiency of man-machine speech interaction, it has been an important research topic in the field of human-computer speech communication. To achieve the synthesized dialect speech, the prosodic model should be produced for the Text-to-Speech synthesis system. A Lanzhou dialect corpus was built based on "Word-list in dialectal survey". Five Degrees Tone Model based prosodic models were built for Lanzhou Dialect by analyzing the differences of pitch, duration and pause duration between Lanzhou dialect and Mandarin. Lanzhou dialect was converted from Mandarin. The results of the research have important value for speech theorys and applications in finding the relation between Mandarin and Lanzhou dialect as well as achieving the synthesis of dialect speech. Main achievements and originalities are as follow:Firstly, a Lanzhou dialect corpus was built. By analyzing the phonology features between Mandarin and Lanzhou dialect and combining with"Dialect word list", a scientific text corpus was built. Speech corpus was recorded with contrastive (Lanzhou dialect vs. Mandarin) recordings. Speech corpus was labeled for each syllable. This corpus includes 1280 monosyllables, 2000 disyllable, 500 sentences and 18 carrier sentences. Monosyllables were designed based on the maximized combinations between different initial and final, and cover the four tones. Disyllables were designed on the combinations of the four tones and the light tone. Sentences include spoken language of Lanzhou dialect and quilt from china daily. Carrier sentences formed as"X say X that X".Secondly, the acoustic differences between Lanzhou dialect and Manderin were analyzed by comparing the acoustic features of monosyllables, disyllable and sentences. Model parameters were analyzed with Pitch Target model. The relationship of different tone and pause duration were obtained. Spectral centroid and vowel triangle between Lanzhou dialect and mandarin were also analyzed. The experimental results can used for building tone conversion rules of mapping Mandarin into Lanzhou dialect. Thirdly, a novel pitch model, which was named five-degree tone value, was proposed to convert Mandarin into Lanzhou dialect. The pitch contour of Lanzhou dialect was generated by the model. Lanzhou dialect was obtained by modifying the fundamental frequency contours of mandarin. Monosyllable and disyllable fundamental frequency model was built. Sentences were divided into syllables and disyllable. A fundamental frequency compensation model was also to achieve more natural continuous speech. Duration and pause duration model were built based on statistical methods. Expriment results showed that the minimum MOS scores achieve 4.5 for syllable and disyllable word and 3.5 for sentences.Fourthly, the synthesized Lanzhou Dialect speech was obtained by concatenating mono-syllables. The SVR was adopted to predict pitch contour of sentences with context of syllables. Four key points on pitch contour, duration and pause duration were predicted by SVR with input text. The prosody was then modified by predicted acoustic features to achieve fluency utterance. Experiments demonstrated the MOS score is 3.6, which indicates the synthesized reault has high naturalist. |