Font Size: a A A

Research On Speech Synthesis For Yi Language Based On Deep Learning

Posted on:2021-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:X L BuFull Text:PDF
GTID:2428330623982075Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
Artificial intelligence is developing rapidly,and deep learning also has broad application prospect.As an important technology,speech synthesis is more and more mature.Based on the high quality,high intelligibility and high naturalness synthetic speech,researchers no longer stick to the parametric methods only,but prefer to use the methods of deep learning.Primary languages such as Chinese and English have the advantages of accessible and huge data resources,so neural network or even deep learning methods are used to make primary languages speech synthesis better and more widely applicable.However,there are still relatively few systematic researches on minority languages speech synthesis.The methods of minority languages,dialect or low resource languages speech synthesis are still scarce.As the sixth largest ethnic minority in China,Yi nationality has its own unique human politics and custom culture,so Yi language has important research value.This thesis took minority languages speech synthesis as the target,then selected Yi language for the research.And the text analysis for speech synthesis of Yi language was studied.The methods of deep neural network(DNN)and end-to-end(E2E)were used to realize the speech synthesis of Yi language.Then the thesis also proposed the improved E2 E model to synthesize high-quality speech by reducing training set effectively.The main work and contribution are as follows:Firstly,a corpus of Yi language is designed and built,and a phonetic conversion dictionary and segmentation dictionary for Yi text analysis are collected,sorted and established.Based on the linguistic features of initials,finals and tones of Yi language,the text analysis of Yi language is realized.The format of context-dependent label is designed,and question set for Yi language speech synthesis is established.Secondly,we realize the speech synthesis of Yi language based on DNN model.The text analysis,context-dependent label and question set of Yi language are used.During the speech processing,the vocoder: WORLD is used to extract the acoustic parameters and restore the speech waveform.The results of the experiment are evaluated subjectively and objectively.The Mel-Cepstral Distortion(MCD)is 5.418 dB.The Mean Opinion Score(MOS)achieves 3.93 by Yi undergraduates as evaluators compared with 4.58 of original speech.Finally,the thesis proposes an improved E2 E model for speech synthesis of Yi language.The encoder network is changed.The text analysis module is integrated,and expert knowledge such as question set is used.Then the Griffin Lim(G & L)algorithm is used to restore the speech waveform from spectrogram data.The experiments of baseline and improved model are evaluated subjectively and objectively.The MCD is 4.426 dB.The MOS achieves 4.19 by Yi undergraduates as evaluators compared with 4.47 of original speech.Under the condition of guaranteeing the quality of speech synthesis,the use of corpus for model is reduced.
Keywords/Search Tags:minority language speech synthesis, text analysis, DNN, E2E
PDF Full Text Request
Related items