Research On Any-to-many Voice Conversion Based On Non-parallel Data

Posted on:2022-06-19

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Yang

Full Text:PDF

GTID:2518306569465744

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Voice Conversion(VC)aims at reconstructing the speech using new speaker information while preserving the linguistic information.It is an important branch in the field of speech synthesis and one of the hot researches in voice interaction.It has a very broad application prospect in the fields of medical treatment,virtual live broadcast,and anti-fraud.In recent years,voice conversion based on deep learning has made great progress.The intra-lingual voice conversion has been able to achieve high naturalness and similarity.However,how to effectively disentangle speaker information and linguistic information,and alleviate the problem of cross-linguistic domain mismatch under non-parallel corpus remains a key technical issue in VC.To solve these problems,we conduct the study of Voice Conversion based on deep learning.The main research work of this paper is as follows:(1)Aiming at how to effectively disentangle speaker information and linguistic information,this paper proposes an any-to-many voice conversion based on phoneme embedding.Most of voice conversion systems based on the phonetic posterior grams(PPGs)can not balance naturalness and similarity on low-resource data.Our algorithm can use the linguistic representation of disentangled phoneme embedding to replace PPGs.Combined with speaker embedding,we use pitch self-supervision to constrain the converted speech of the target speaker,and uses multi-step output and random learning strategies to improve the context information and generalization ability of voice conversion system.Experimental results show that our proposed model can achieve better performance with mel-cepstral distortion,,word error rate and subjective evaluation.(2)In terms of cross-language domain mismatch under non-parallel corpus,this paper proposes a cross-language voice conversion algorithm based on time-frequency feature enhancement and speaker domain adversarial training.Most of cross-lingual voice conversion algorithms are still not enough to adapt to the speaker differences caused by language mismatch,especially when the target language does not appear in the training phase,resulting in the loss of linguistic or wrong pronunciation.The cross-lingual voice conversion proposed in this paper can use the mix-language phoneme recognition model to extract universal linguistic representations(ULRs),and use speaker domain adversarial to better remove the speaker information.It also use speaker standardization to reconstruct the target speech more efficiently.We also design a useful multi-scale time-frequency enhancement to denoise speech background noise.Experimental results demonstrate that our algorithm can achieve high naturalness and similarity of cross-lingual voice conversion among different languages.The research work on proposed voice conversion is verified objective and subjective,which has great application value.

Keywords/Search Tags:

voice conversion, cross-lingual, disentangled universal linguistic representation, speaker domain adversarial

PDF Full Text Request

Related items

1	Fast One-shot Cross-lingual Voice Conversion Based On Dual Encoders
2	The Research On Cross-lingual Speaker Recognition Based On Language-adversarial Training
3	Rendering Speech Across Speaker And Language Difference
4	Research On Machine Reading Comprehension Model Based On Cross-lingual Transfer Technology
5	Chinese Text-to-Speech Based On Deep Learning
6	Non-parallel Many-to-many Voice Conversion Based On Dynamic Convolution StyleGAN
7	The Research On Learning Cross-lingual Word Embeddings Based On Adversarial Training
8	Research On Image Generation Algorithm Based On Autoencoder
9	The Key Technologies Of Cross-lingual Aspect Sentiment Classification Towards E-commerce Reviews
10	Research On Cross-lingual Textual Emotion Cause Detection