High-quality Voice Conversion From Non-parallel Corpora Based On Variational Auto-encoder And Bottleneck Feature

Posted on:2019-03-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Ling

Full Text:PDF

GTID:2428330566999285

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Speech is a kind of signal generated when the speaker is vocalizing.It contains many kinds of natural information,such as semantic information,speaker's personal information and emotion,which are easy to collect.The technique of voice conversion is to change the personality characteristics of the source speaker,so that it has the personality characteristics of the target speaker,and keep the semantic information unchanged.In recent years,the concept of deep learning and the worldwide research boom on this concept,have got great attention.Some of them have taken advantage of deep learning models in the study of voice conversion and achieved gratifying progress.As various deep learning models have the ability to obtain intrinsic features of complex signals,and efficiency of research have been improved.With intensive research on deep-learning,various new concepts and models are applied to the study of voice conversion,which solves various practical problems.Applying the method of deep learning to the research of voice conversion technology can help to promote other areas of speech signal processing and further improve the efficiency of speech intelligent devices and intelligent human-computer interaction.Therefore,the study of voice conversion using the method of deep learning has broad prospects and far-reaching theoretical and practical value.This thesis is focusing on the voice conversion model based on VAE and Bottleneck features.In the training stage of decoder in VAE,the label feature in hidden layer have not been fully utilized.The Bottleneck feature obtained by DNN is used as the label of speaker.This algorithm takes full advantage of the label features in the VAE model and improves the voice conversion performance.Furthermore,when the training data of target speaker is limited,a method of intervening the training process of DNN is proposed,which solves the M2M voice conversion problem by enriching the target speaker's personality feature space.Through experimental analysis,the MCD(Mel-cepstrum distortion,MCD)of the proposed method is lower than that of the baseline system,decreased by 5.39%on average in non-parallel corpora training condition,reflecting the spectral similarity between converted speech and target speech are better.In terms of subjective evaluation,the PESQ-MOS value is higher,which increased by 24%on average,indicating that the voice quality of the model is better.In the VAE+Bottleneck experiment where the target speaker is not fully trained,by intervening the DNN training process.Through the listening test,29.0%of the test results show that there is no difference between sufficient and limited training data situation.Analytical and experimental results show that the converted speech obtained by the proposed method has higher spectral similarity and higher PESQ-MOS values,which indicates that there is a certain improvement in spectral similarity and speech quality.

Keywords/Search Tags:

AHOcoder, MFCC, Variational Auto-encoder(VAE), Bottleneck feature, Many-to-Many(M2M) voice conversion

PDF Full Text Request

Related items

1	Research On Many To Many Voice Conversion Based On I-vector And Improved Variational Autoencoder For Non-parallel Corpora
2	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
3	Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora
4	Research On High Quality Voice Conversion Algorithm Based On Improved GMM And Frequency Warping
5	Deep Auto-encoder Framework For SAR Images Change Detection
6	Voice Conversion Based On AHOcoder And GMM Model
7	Research And Application Of Representation Learning Based On Variational Auto-encoder
8	Study On Feature Parameters In Voice Conversion
9	Studies On Key Techniques For Voice Conversion
10	Research On Algorithms Based On Variational Information Encoding And Convolutional Neural Networks For Signal Classification In Brain-Computer Interface