Font Size: a A A

Research On Deep Learning Based Small-Sized Unit Concatenation Speech Synthesis

Posted on:2018-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z P ZhouFull Text:PDF
GTID:2348330512485641Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech synthesis technology aims to achieve the conversion from input informa-tion such as text to speech waveform.Statistical parametric speech synthesis approach and corpus-based unit selection and waveform concatenation approach are two kinds of mainstream speech synthesis approachs at present.The former approach has some advantages,such as automatic system construction,high smoothness of synthesized speech.However,subject to parametric synthesizer and other factors,the naturalness of its synthesized speech is not good enough.It is an effective way to improve the nat-uralness of synthesized speech of statistical parametric speech synthesis approach that using frame-sized speech segments in unit selection and waveform concatenation ap-proach under the guidance of statistical acoustic models.The traditional small-sized unit selection method adopts the hidden Markov model(HMM)for acoustic modeling and cost calculation.In recent years,the deep learning methods,e.g.,deep neural networks(DNN),has demonstrated better performance than HMMs in the acoustic modeling of statistical parametric speech synthesis.Therefore,this dissertation focuses on the small-sized unit concatenation speech synthesis approach based on deep learning.On one hand,the acoustic modeling method for guiding small-sized unit selection based on neural network is investigated,and the modeling accuracy of traditional HMM and the quality of synthesized speech are im-proved by using the model structure of DNN and recurrent neural network(RNN);On the other hand,a speech synthesis approach combining unit selection and parameter generation is proposed.This approach realized the generation of excitation characteris-tic waveforms(CW)using unit selection method,and improved the modeling ability of phase and other excitation information in traditional statistical parametric speech syn-thesis approach,as well as the naturalness of synthesized speech.The main work of this dissertation is listed as follow:Firstly,this dissertation proposed the DNN-based unit selection and waveform con-catenation speech synthesis approach using frame speech segments.This approach uses DNN for acoustic modeling to calculate the target costs and the concatenation costs in unit selection.It improved the accuracy of the model and the quality of synthesized speech compared with traditional HMM.Then,this dissertation investigated the RNN-based unit selection and waveform concatenation speech synthesis approach using small-sized speech segments.On the one hand,this approach uses RNN with long short-term memory(LSTM)cells for acous-tic modeling to improve the time series modeling ability of DNN.On the other hand,the multi-frame unit selection strategy is applied to reduce the concatenation points and it has achieved the better naturalness of synthesized speech than the DNN-based unit selection and waveform concatenation speech synthesis method using frame speech seg-ments.Finally,this dissertation achieved the parametric synthesis approach combining excitation CW generated by unit selection.This approach firstly conducts the charac-terization and acoustic modeling of CW extracted from speech waveform,then gener-ates the high-frequency component of excitation CW using frame concatenation method at the synthesis stage,and predicts filter coefficients using the parameter generation method at the same time.Finally,the speech waveforms are proceduced by filtering.The experimental results show the effectiveness of the proposed method in improving the naturalness of synthesized speech of traditional parametric synthesis approach.
Keywords/Search Tags:Speech Synthesis, Parametric Synthesis, Unit Selection, Deep Neural Net-work, Recurrent Neural Network
PDF Full Text Request
Related items