Research On Tibetan Voice Conversion Based On Deep Learning

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:G Y Zhao

Full Text:PDF

GTID:2415330629488956

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The voice conversion is a kind of speech waveform modification technology which can transform non-semantic information freely on the premise of the source speaker's semantic information is unchanged,so that the converted voice has the target speaker's personality characteristics.At present,most of the mainstream voice conversion technologies are implemented under the condition of parallel corpus,but in reality,for low-resource Tibetan languages,the acquisition of parallel corpora is very expensive,and the alignment of acoustic features is also prone to problems.Therefore,the purpose of this article is to study Tibetan voice conversion,focusing on Tibetan voice conversion under parallel and non-parallel corpus conditions.The main work is as follows:(1)The theory and flow of voice conversion are sorted out,and the WORLD vocoder is used to extract speech acoustic parameters and speech synthesis.(2)The design of the Tibetan WeiZang dialect text-corpus for Tibetan voice conversion is studied,and the basis of Tibetan voice conversion is established.First,the text corpus should cover various combinations of phonemes in the Tibetan WeiZang dialect,and strive to make the frequency of different phonemes as balanced as possible to avoid data sparseness.After the text corpus is designed,the corresponding speech corpus is recorded and segmented.(3)Under the condition of parallel corpus,the Deep Neural Network and Generative Adversarial Network are introduced into the conversion of Tibetan speech spectrum parameters.Through a large number of experiments,the results show that both DNN and GAN network can achieve Tibetan voice conversion,and the conversion effect is better than that based on Gaussian mixture model.(4)Due to the limitation of Tibetan parallel corpus,this article also studies more flexible and general Tibetan voice conversion under the condition of non-parallel corpus.To improve the above GAN network,the Tibetan voice conversion method based on the CycleGAN and StarGAN networks is proposed.Through a large number of experiments,the results show that the Tibetan voice conversion effect based on the CycleGAN network is close to the GMM based conversion under parallel corpus conditions.Moreover,the CycleGAN.based method achieves a bidirectional conversion of �one-to-one� conversion,while the GMM method is a �one-to-one� one-directional conversion;the Tibetan voice conversion effect based on the StarGAN network is worse than the GMM based conversion under parallel corpus conditions,but the StarGAN method implements a "many-to-many" conversion,which is more flexible and efficient.

Keywords/Search Tags:

Tibetan, Voice Conversion, parallel/non-parallel corpus, Generative Adversarial Network(GAN)

PDF Full Text Request

Related items

1	Research On Voice Conversion From Tibetan Amdo To U-tsang Dialect Based On Deep Learning
2	Colorless Video Rendering System Via Generative Adversarial Networks
3	Research On Calligraphic Character Generation Based On Generative Adversarial Network
4	Research On Sentence Structure Of English And Tibetan In A Bilingual Parallel Corpus
5	Research On Intelligent Restoration And Optimization Methods Of Ancient Mural Images Based On Generative Adversarial Network
6	Research On Calligraphy Font Generation Based On Generative Adversarial Network
7	Parallel Processing On Parallel Corpus Of Chinese-English
8	Multi-style Chinese Character Font Generation System Based On Generative Adversarial Networks
9	Research On Generative Adversarial Network And Its Application In Graphic Design
10	Research On Generating Chinese Calligraphy Characters Based On Generative Adversarial Networks