Font Size: a A A

Research On Synthesis Methods Of Singing Oriented To Timbre Conversion

Posted on:2019-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z M QiFull Text:PDF
GTID:2428330575950238Subject:Software engineering
Abstract/Summary:PDF Full Text Request
For the past few years,along with the rise and maturity of artificial intelligence and pattern recognition and related technologies,people have strong demands for higher intelligent human-computer interaction,these demands make audio signal processing has becoming the hot area of research.The timbre conversion of singing voice is an emerging project in audio signal processing,there are few related research results on the timbre conversion of singing voice,but most of the research work has inherited the theory and method of speech timbre conversion.Nevertheless,most of the speech timbre conversion methods are "one-to-one",which rely on parallel datasets and targeting only one specific source speaker and one specific target speaker.Obviously,the generalization ability of these models is very poor,and they are not suitable for use in singing voice timbre conversion.For these issues,in this article,the "many-to-many" singing voice timbre conversion model which under non-parallel data sets is the main topic.Then,this article takes singers' singing voice signal as the research object,by constructing a voice timbre representation model,a reasonable voice timbre representation of these singers can be obtained.Based on the voice timbre representation model,a singing voice-oriented timbre conversion model can be constructed,and then,the timbre of singing voice can be converted.Finally,the noise reduction is done for the singing voice after timbre conversion.Firstly,this article presents a method of voice timbre representation,which based on deep convolutional neural network and transfer learning.In this method,an instrument timbre representation model based on deep convolutional neural network is trained by a large instrument audio datasets.By fine-tuning the instrument timbre representation model with a small singing voice audio datasets,a voice timbre representation model can be obtained.The experimental results show that the corresponding timbre features that extracted by the instrument timbre representation model and voice timbre representation model perform well in the timbre classification experiments.Subsequently,on the basis of the voice timbre representation model,this article presents a model of singing voice timbre conversion based on Variational Auto-Encoder and Generative Adversarial Networks.This timbre conversion model breaks through the dependency on parallel datasets and supports "many-to-many"timbre conversion of speech and singing voice.The experimental results show that,the timbre-converted speech or singing voice via this timbre conversion model is perform better in MFCC-distortion and MOS-score than the traditional "one-to-one"timbre conversion model under parallel datasets.Ultimately,in this article,a deep neural network model that composed of Restricted Boltzmann Machine is used to realize the noise reduction for the timbre-converted audio signal.The experimental results show that,the denoising model of deep neural network can effectively restrain the noise in the audio that after timbre conversion,and improve the quality and the signal-noise ratio of the timbre-converted audio.
Keywords/Search Tags:singing synthesis, voice timbre representation, singing voice timbre conversion, audio denoising
PDF Full Text Request
Related items