Improved Tacotron2 Speech Synthesis Method Based On Forced Monotonic Attention Mechanism

Posted on:2022-10-12

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Wang

Full Text:PDF

GTID:2518306572450884

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speech is one of the most important ways for human beings to accept external information and express their own thoughts,which plays an indispensable role in people's life.The application of speech technology involves many aspects,among which speech synthesis is the technology of converting text information into speech signal.This research has a very long history and has achieved a lot.In recent years,with the improvement of computer computing ability and the development of deep learning,the speech synthesis technique has made rapid development,and many novel and efficient synthesis methods have emerged,among which the method based on end-to-end model has been widely concerned.However,the attention mechanism in this kind of model is evolved from the field of computer vision and machine translation,which is not fully suitable for speech synthesis task.Unsuitable attention mechanism will bring great impact on the synthesis performance.Attention mechanism applied in speech synthesis task needs to be monotonous.However,the current research on how to make attention mechanism meet this requirement is not enough.This paper focuses on how to improve the attention mechanism in the endto-end model to ensure its monotonicity and make it more suitable for speech synthesis tasks.Firstly,tacotron 2 model,which is a representative model of end-to-end speech synthesis system,is analyzed.And its attention mechanism is studied in detail.Then,an independent forced monotonicity method is proposed to design constraint vectors only based on monotonicity.In order to guarantee the monotonicity,the constraint vector of each time step is designed.At the same time,neural network is used to predict the weight of the constraint vector to dynamically adjust the size of the constraint vector.The attention vector of each step is the weighted sum of the output of the original attention mechanism and the constraint vector.Finally,the method of using phoneme duration information with more speech characteristic to guide the training of attention mechanism is proposed.It uses the duration prediction module of traditional parameter speech synthesis to get the required information,and then changes the information into an alignment matrix similar to the attention matrix after a series of operations,such as expanding according to the frame length and relaxing according to the peak position.By reducing the distance between the attention matrix and the alignment matrix,the convergence speed of the attention mechanism and the correctness of the alignment information represented by the attention mechanism can be improved obviously.

Keywords/Search Tags:

Speech synthesis, End to end model, Attention mechanism, Long short time neural network, Phoneme duration prediction

PDF Full Text Request

Related items

1	Research On Long And Short-term Neural Network Recommendation Model Based On Self-attention Mechanism
2	Mandarin Speech Synthesis System And Rhythm Adjustment
3	Research On The Key Problems Of Eye-Driven Speech Synthesis
4	Research On Network Traffic Prediction Based On Deep Learning Method
5	Study On Speech Enhancement Based On Deep Learning
6	Research On Speech Synthesis Algorithm Based On Sequence To Sequence Model
7	Research On Deep Learning Algorithm For Sequence Data
8	Text Sentiment Analysis Combining Part-of-speech Skipping And Multi-attention Interactive Network
9	Research On Air Quality Prediction Based On Deep Learning
10	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism