Research On Personalized Speech Synthesis Based On Deep Speech Representations

Posted on:2023-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:D W Liu

Full Text:PDF

GTID:2558307154474424

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speech Synthesis is a technology that converts input text into natural speech.Compared with traditional speech synthesis,personalized speech synthesis is more complex.It is necessary for personalized speech synthesis to effectively control personalized information such as speaker information and accent information of synthesized speech on the basis of ensuring the intelligibility and naturalness of synthesized speech.Due to the complexity of personalized speech synthesis and the increasing user demands,the research for multi-speaker speech synthesis and accent speech synthesis which can effectively control speaker information and accent information has become an important research topic.At present,the multi-speaker speech synthesis model based on speaker encoder which is designed according to the speaker verification or recognition model can effectively control the speaker information in synthesized speech.However,the speaker encoder based on speaker classification task ignores the richness of speech information,such as linguistic information and speaker dynamic information in speech,thus destroys the naturalness of synthesized speech.At the same time,it limits the further development of the multi-speaker speech synthesis model because that the design of the speaker encoder highly depends on the speaker verification/recognition model.Currently there are only a few related researches on accent speech synthesis.The research of accent speech synthesis is less now.In the process of accent transfer learning,the traditional end-to-end speech synthesis methods relies highly on large-scale accent data,lacks the application of accent prior knowledge,and accent information is mixed with other information.In view of the above problems,this paper proposes to use deep speech representations to effectively control speaker information and accent information.The deep representations is composed of rich speech embedding which includes speaker information and linguistic information and deep accent representation.The main contributions of this paper are as follows:(1)This paper proposes a new multi-speaker speech synthesis model based on rich speech embedding.It uses speech embedding extraction model based on speech recognition to extract rich speech embedding containing speaker information and linguistic information,and adds speaker labels during model training to further improve the ability of the speech embedding to control speaker information.Feature visualization analysis,subjective and objective experiments results show that the proposed model is not only able to control the speaker information of synthesized speech,but also significantly improves the naturalness of synthesized speech.(2)This paper proposes a new accent speech synthesis model based on prior knowledge guidance and deep accent embedding.It uses a self-supervised accent encoder based on speaker labels and tone related acoustic features as soft labels to extract deep accent representations,and adds prediction of tone related acoustic features to speech synthetic acoustic models to improve the modeling and control of accent information.Unsupervised data filtering and progressive data augmentation strategies are adopted during all model training process.Experimental results have proved that the proposed model can effectively control accent information.In summary,the personalized speech synthesis technology based on deep speech representation proposed in this paper has both high theoretical research and practical application value.

Keywords/Search Tags:

Speech synthesis, multi-speaker speech synthesis, accent speech synthesis, personalized speech synthesis

PDF Full Text Request

Related items

1	Research On Statistical Parametric Mandarin-Tibetan Cross-lingual Speech Synthesis
2	Research And Implementation Of Speech Synthesis Method For Helping Old Robots
3	Research And Implementation Of Speech Synthesis Based On Fastpeech
4	The Application Of HMM In Parameter-Based Text-To-Speech System
5	Research Of Embeded Speech Synthesis Technology
6	Speech Technique Research Of Intelligence Robot
7	Research On Statistical Parametric Speech Synthesis Based On Speaker Adaptive Training
8	Research And Implementation Of Multi-Speaker Speech Synthesis System For Audio Novels
9	A Study On Speech Synthesis And Visual Speech Synthesis Based On Neural Networks
10	Research And Realization Of Embedded Speech Synthesis System