Font Size: a A A

Research On Mandarin Singing Synthesis Based On Deep Learning

Posted on:2022-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2518306500456984Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
Singing plays an important role in expressing emotions and relieving emotions.Synthesizing singing can create virtual personage with unique voice,dub the songs for the characters in science fiction animations,and realize the singing of deceased singers with timbre.It can also help to carry out singing education in areas lacking teacher resources.In traditional singing synthesis based on Hidden Markov Models(HMM),using the maximum likelihood criterion or the minimum mean square error criterion has statistical average effect,which leads to the synthesis result oversmoothing,the synthesis singing quality is poor and the synthesis time is long.Aiming at these problems,this thesis uses three methods to synthesize singing to improve the effect of synthesis.The main work of this thesis is as follows:Firstly,we built a song database.This thesis selected 100 songs,a total of about300 minutes.A professional singer was invited to record 100 songs in a professional recording environment.The recording format was 48 k Hz sampling rate and 16 bit.Then the recorded songs were cut into complete short sentences,and 1560 sentences of 8 ?15s were obtained after proofreading and testing.We used Praat to label the starting and ending times of short sentences.Then,the lyrics in the music score were converted into pinyin which was easy to be processed by the computer.After proofreading,the music score was analyzed and context-dependent label were obtained.Secondly,we realized singing synthesis based on Deep Neural Networks(DNN).Acoustic features were extracted from the constructed corpus,and DNN model was used to model and train the vibrato,time-lag and duration.The singing was evaluated,and the average MOS score is 2.68.Thirdly,we proposed singing synthesis based on Generative Adversarial Network(GAN).The acoustic features needed by the model were extracted from the constructed corpus.The generator of GAN was used to generate singing,and the synthetic effect was judged by the discriminator.This process minimized the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.After many times of updating the generator and the discriminator,the synthetic singing was obtained.The singing was evaluated,and the average MOS score is 3.05.Finally,we realized singing synthesis based on Fast Speech.Phoneme ID,note duration and note pitch were extracted from the constructed corpus,which were sent to encoder and decoder for processing.The acoustic features were synthesized through WORLD vocoder to produce singing.The singing was evaluated,and the average MOS score is 3.45.
Keywords/Search Tags:Singing voice synthesis, Construct singing corpus, DNN, GAN, FastSpeech
PDF Full Text Request
Related items