Font Size: a A A

Research On Mandarin Singing Synthesis Based On Wavenet Architecture

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y R YouFull Text:PDF
GTID:2428330629988955Subject:Engineering
Abstract/Summary:PDF Full Text Request
Singing is a form of music expressed through human voice,and also the most expressive expression of human speech.Singing Voice Synthesis(SVS)refers to synthesize the singing voice of a human being through computer based on the speech synthesis-related technologies.SVS,as a new application in the field of speech synthesis technology,has good practical value and application prospects in terms of virtual singers,record production,digital music creation and so on.As the research on voice synthesis technology further develops,people made progress on methods of voice synthesis,but there is no in-depth study on the method of Mandarin singing synthesis yet.Singing synthesis is more challenging than speech synthesis because it focuses more on the interpretation of melody and also involves the processing of song information,such as rhythm and tonality.This thesis described my work on the algorithm of singing synthesis based on the statistical parameter model.On the basis of existing Mandarin speech synthesis and singing synthesis,I proposed an improved Mandarin singing synthesis model based on WaveNet architecture,which can get high-quality singing synthesis with small training corpus.The main work and innovation of this thesis are as follows:Firstly,we established a singing corpus for Mandarin singing synthesis.According to the rhythm,tonality and other characteristics of the song,we built a music extensible markup language(MusicXML)score file including 90 selected songs,and designed a singing voice recording scheme.A professional adult female singer was invited to record the singing corpus in a professional recording studio.The total length of recorded Mandarin singing corpus is 169 minutes,which provides a solid data guarantee for the next work of Mandarin singing synthesis.Secondly,we proposed a musical score analysis method to get context-dependent labels from MusicXML score files of Mandarin songs.We designed five layers of context-dependent label formats including phoneme layer,syllable layer,music information layer,phrase layer,and song layer.By inputting MusicXML score file in Mandarin,the context-dependent labels of MusicXML score file for acoustic modeling can be generated.Thirdly,we developed a mandarin singing voice synthesis method based on statistical parameter speech synthesis technology.This method can synthesize singing voices with more accurate rhythms and pitches,moderate sound intensity,and personalized tones according to the input MusicXML score files.Fourthly,we proposed a method of Mandarin singing synthesis based on WaveNet architecture.Through this method,the features extracted by the parametric vocoder are modeled respectively,and the voice would be reconstructed by the WORLD vocoder.The extracted features include: Spectral Envelope(SP),Aperiodicity Envelope(AP),Fundamental Frequency(F0),Voiced / Unvoiced(V / UV)decision.Subjective and objective experimental results show that the method can synthesize acceptable Mandarin singing voices using a limited singing database.
Keywords/Search Tags:Singing voice synthesis, Singing voice database, Music score analysis, Hidden markov model, WaveNet
PDF Full Text Request
Related items