Research On Personalized Chinese Speech Synthesis For Few-shot Scenarios

Posted on:2023-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Yang

Full Text:PDF

GTID:2568306830991359

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of speech synthesis technology,people’s requirements for speech synthesis have gone beyond naturalness and intelligibility.And they hope that the speech synthesis system can be more diverse and personalized,specifically,specific speakers’ voices customization,which is also called voice cloning.However,the personalized Chinese speech synthesis system is often limited by few-shot scenarios during the system construction.So,the generalization and accuracy of the system cannot be guaranteed.First,the target speaker’s audio is very rare,and it is hard to apply a large-scale data-driven method for voice cloning.And the existing few-shot method of voice cloning suffers from a complex training process and overfitting.In addition,for personalized Chinese speech synthesis systems,the polyphone disambiguation module is also plagued by data shortages.The existing methods often require a large-scale corpus while the cost of manual annotation is too expensive.And the only opensource dataset has small data volume and poor data quality,so,it is hard to ensure classification accuracy.In view of the above problems,this article has carried out the following work.(1)A few-shot voice cloning method based on the phoneme-level speaker feature is proposed.This method utilizes the fine-grained speaker characteristics in the target speakers’ audio,and we introduce an attention mechanism to transfer the speaker feature between various phonemes,supplemented by the random samples training strategy,to improve the utilization efficiency of the target speaker’s data and achieve high-quality voice cloning in few-shot scenarios.(2)A polyphone disambiguation method based meta-learning is proposed.This method no longer treats the polyphone disambiguation as a classification task in machine learning but compares and distinguishes the semantic features of different pronunciations in the polyphone.It determines the pronunciation of polyphones in the pattern of feature comparison.The experiment results show that our voice cloning method is more generalized than the existing methods,and the generated speech has high voice similarity compared with target speakers and high naturalness.And our polyphone disambiguation method via meta-learning shows excellent disambiguation performance in the low-quality training dataset.It shows sufficient generalization performance compared with the existing methods and has a good performance on unseen polyphones.

Keywords/Search Tags:

Personalized Chinese Speech Synthesis, Voice Cloning, Polyphone Disambiguation, Few-shot Speech Synthesis

PDF Full Text Request

Related items

1	Research On Personalized Speech Synthesis Based On Deep Speech Representations
2	End-to-End Speech Synthesis Based On Multi-Language Modeling
3	Design And Realization Of One-Shot Vehicular Voice-User Interface System
4	The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network
5	Research And Implementation Of Speech Synthesis Based On Fastpeech
6	Research On Embedded Speech Synthesis Technology
7	Research On Technology Of Chinese Speech Synthesis
8	Realization And Improvement Of Chinese Emotional Speech Synthesis Based On HMM
9	Research On Chinese Personalized Text-to-Speech Based On Deep Learning
10	Research Chinese Speech Based On Speech Recognition And Speech Synthesis Conversion