Research On Vietnamese Speech Synthesis Technology Based On End-to-End

Posted on:2024-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:J B Zhang

Full Text:PDF

GTID:2558307124986229

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speech Synthesis,is a technology that converts text content into speech,and is widely used in various products.In recent years,with the needs of market development,people’s demand for Vietnamese speech synthesis has gradually increased,but the synthesized speech still has problems of poor naturalness,insufficient rhythm and slow synthesis speed.Focusing on the above problems,this paper conducts targeted research.Regarding the speech synthesis model,the specific research contents are as follows:First,an improved autoregressive Vietnamese speech synthesis model based on NAT(Non-Attention Tancotron)is proposed.Aiming at the problem that the mel spectrum generated by the NAT model is too vague,the Flow-based network structure is used in the model post-processing network to improve the naturalness and clarity of the synthesized speech.And in the encoder module,the method of superimposing hole convolution and ordinary convolution is used to learn more context information and enrich prosodic information.Compared with the generated mel spectrum features,the improved NAT model is better than the original NAT model,and the MCD value is 0.53 lower than the original NAT model.The MOS score is 0.42 higher than the Tacotron2 model,0.21 higher than the original NAT model,and 0.32 different from the real recording,which proves that the improved NAT model is better than the original model.Second,a modified non-autoregressive synthetic model based on VITS is proposed.Aiming at the heavy calculations in the VITS decoder that lead to slow compositing and the difficulty of finding the best alignment between text markers and spectral frames.This paper introduces an i STFT-based decoder to replace the upsampling structure,which greatly reduces the computational load of the decoder.At the same time,a duration search algorithm is used to obtain the best alignment between the text and the spectrum frame.The improved model is 0.78 lower than the original VITS model in terms of MCD value,and the synthesis speed is significantly improved.The MOS score is not only higher than the 0.21 of the original VITS model,but also only 0.06 different from the real recording,which further improves the quality of Vietnamese speech synthesis.Third,a prototype structure for Vietnamese speech synthesis was developed.Based on the research on speech synthesis,this paper designs and completes three modules of Vietnamese text regularization,long sentence segmentation and speech synthesis to realize the functions of the Vietnamese speech synthesis system.The results of the functional test and stress test show that the functions of the three modules of the Vietnamese speech synthesis system have been realized and can be used normally.The high-quality Vietnamese is synthesized,and its similarity with real people is as high as 98%.

Keywords/Search Tags:

Vietnamese, NAT, VITS, Speech Synthesis, Autoregressive, Non-autoregressive, Dilation Convolution

PDF Full Text Request

Related items

1	Research On End-to-End Non-Autoregressive Model-Based Amdo Tibetan Speech Synthesis Technology
2	Study On Chinese Speech Synthesis Methods Based On Deep Learning
3	Deep Neural Network Acoustic Modeling For Efficient Speech Synthesis
4	Based Hmm Can Be Training Vietnamese Speech Synthesis System
5	An Algorithm For Network Traffic Predicting Based On Wavelet Transform And Autoregressive Model
6	Research On Speech Synthesis Technology For Chinese Advertisement Text
7	Research Of Neuron Machine Translation Based On Non-autoregressive Method
8	Real-time Human Motion Recognition And Prediction Based On Autoregressive Learning Algorithm
9	EEG Personal Identification Based On Brain Functional Network And Autoregressive Model And Its Application
10	Research On Text-generative Steganography Method Based On Non-Autoregressive Model