Font Size: a A A

Reference Encoder Base End-To-End Accent Conversion

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2518306128975939Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Accent conversion is a kind of voice conversion technology,which aims to convert the source speaker's voice into the target accent while keeping the original timbre unchanged.It can be widely used in the fields of accent correction and accent scoring.Traditional accent conversion methods,including voice morphing,frame pairing and articulatory synthesis,generally have some problems on sound quality and timbre.Recently,an end-to-end accent conversion system based on PPGs has been proposed,which makes some improvements in audio quality,but suffering lackness of controling intonation and stress,and is not stable enough.In addition,the above methods require parallel corpus of target accents for inference,which brings limitation in application.The reference encoder is used to provide auxiliary information.It is generally composed of multi-layer neural networks and is widely used in prosody control tasks,such as speech synthesis and voice conversion.The reference encoder transforms compressed acoustic features or linguistic features to tone control vectors,to achieve the expected influence on the generated audio.In view of the above problems in accent conversion,the following improvements are made.First,the reference encoder is added to enhance the control of prosody,which improves the result of accent conversion;Second,the end-to-end accent conversion model is structurally adjusted to improve the quality and stability of the generated audio;Finally,the audio generated by the target accent's speech synthesis system is used as the target accent audio,thus solving the need for parallel corpus.
Keywords/Search Tags:Accent conversion, Reference encoder, End-to-End, Speech synthesis, Voice Conversion
PDF Full Text Request
Related items