Research On Tibetan Speech Synthesis Based On Deep Learning

Posted on:2023-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhang

Full Text:PDF

GTID:2558307154974669

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Speech synthesis technology refers to the synthesis of speech with high intelligibility and high naturalness by providing text information.The problem of Tibetan speech synthesis is to convert the Tibetan text containing Tibetan information into a more natural Tibetan speech that people can understand.Speech synthesis plays an important role in the field of human-computer interaction.With the development of deep learning technology,speech synthesis technology has been widely used in speech navigation,intelligent interactive equipment,Internet of things,e-book education and other fields.However,due to the small scale of Tibetan corpus and speech database,and the lack of in-depth research on Tibetan intelligent informatization,there is a large space for the research and development of Tibetan speech synthesis compared with English and Putonghua.At present,Tibetan speech synthesis has a series of problems in naturalness and intelligibility.The specific performance is that the prosodic relationship is not obvious and the robustness of synthetic audio is not strong,resulting in low sound quality and poor expressiveness of synthetic speech.Therefore,in order to solve these problems,this paper uses deep learning technology to improve the front-end technology and backend technology of Tibetan speech synthesis.It is mainly reflected in these three aspects:1)Based on the requirements of neural network speech synthesis system in deep learning technology,according to the methods of phoneme balance,Tibetan word frequency analysis,Tibetan word formation method analysis,high-frequency word statistical analysis and so on,a synthetic corpus in line with the characteristics of Amdo Tibetan language is designed,and in a professional recording environment,using professional equipment Record with scientific methods to form a 60 hour Amdo Tibetan speech synthesis speech database.2)Combined with Amdo Tibetan pronunciation dictionary and Willie transcription scheme,a transcription scheme with the characteristics of Amdo dialect is formed and applied to the front end of Tibetan speech synthesis.The phoneme vector containing Amdo dialect information is formed through the transcription scheme,and combined with the Blistm CRF prosody prediction model,the feature expression ability of external information is improved,so as to improve the naturalness of Amdo Tibetan speech synthesis.3)In the end-to-end framework of Tibetan speech synthesis,the ability of word embedding layer to obtain information is improved,its independence is reduced,and the intelligibility and naturalness of Tibetan speech synthesis are improved by combining Amdo Tibetan speech element vector.Using the large-scale Amdo Tibetan speech synthesis library designed by ourselves and through the design of pre-trained model,the effect of Wei Tibetan speech synthesis in low resource environment is improved.Experiments show that this method can effectively improve the naturalness of Tibetan speech synthesis.To sum up,this paper makes an objective and subjective evaluation on the above proposed methods,which shows that the naturalness of Tibetan speech synthesis has been well improved,which is conducive to the development of intelligent speech information processing in Tibetan areas.

Keywords/Search Tags:

Tibetanspeech, Speech Synthesis, Speech Synthesis Corpus, Acoustic Modelling, Deep Learning, Prosody Modelling

PDF Full Text Request

Related items

1	Mongolian Speech Synthesis Based On Deep Learning
2	Investigating The Key Problems In Deep Learning Based Acoustic Modeling For Speech Synthesis
3	Research On Personalized Speech Synthesis Based On Deep Speech Representations
4	Research On Acoustic Modelling And Text Generation In Concept-to-Speech Conversion
5	Research On Chinese Speech Synthesis Method Integrating Pause And Personal Information
6	Research On Deep Learning Based End-to-End Chinese Speech Synthesis
7	An Improved Speech Synthesis Method
8	The Research Of Speech Synthesis And Prosody Control In Wu-Dialect Text-to-Speech
9	Create An Emotional Speech Synthesis Corpus
10	Corpus Supported English Text To Speech Synthesis Engine