Font Size: a A A

Research On Speech Generation Of Dongxiang Dialect

Posted on:2019-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:F K QiFull Text:PDF
GTID:2428330545481736Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,most speech synthesis researches are conducted in a way of Text-to-Speech(TTS)fashion.That is,speech synthesis system transforms text input into speech through a series of linguistic procedures,such as normalization,split and syntax analysis.However,owning to the vast territory and many ethnic groups of our country,there are some dialects which don't have text language.And speech synthesis researches for such dialect is at its infancy.In this thesis we explored the linguistic and phonetic characteristics of Dong-Xiang language,a dialect without text language.The thesis also built a phonetic transcription scheme and collected a corpus of Dong-Xiang language labeled with such scheme.To synthesize the speech,the thesis tried methods based on both the Hidden Markov Model(HMM)and Deep Neural Networks(DNN).The main works and originalities of the thesis are as follows:1.The thesis built a corpus of Dong-Xiang language and analyzed the linguistic characteristics such as the vowel,consonant,phrases and sentence structures.The thesis also analyzed the experimental phonetic features such as tone frequency and intonation.The corpus contains 800 sentences,a half of them is the bus broadcast audio and the other half is daily languages in Dong-Xiang dialect.The corpus covered the pronunciation of both vowel and consonant word and intonation of common words.Each sentence is recorded by native speaker and stored in Microsoft WAV file format(single channel,16 bits,16Khz sampling rate).2.The thesis designed a phonetic transcription scheme for Dong-Xiang language,known as SAMPA-DX(Speech Assessment Methods Phonetic Alphabet for Dong Xiang).And each sentence within the corpus was labeled by the SAMPA-DX.3.The thesis was completed the speech generation of Dongxiang dialect without text.The HMM-based acoustic models and DNN-based acoustic models are trained from labeled speech corpus by context analysis.For a given bus-stop broadcasting message,the system firstly obtained context-dependent labels by context analysis.The system then generated Dongxiang dialect speeches of bus-stop broadcasting message by HMM models or DNN models.The speech quality,naturalness and similarity of synthesized speeches were proved to be high by our assessments.
Keywords/Search Tags:Speech generation in Dongxiang, non-text dialect speech generation, HMM-based speech synthesis, DNN-based speech synthesis, Context analysis
PDF Full Text Request
Related items