Font Size: a A A

Research On Tibetan Speech Synthesis Based On Deep Learning

Posted on:2023-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2558307154974669Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech synthesis technology refers to the synthesis of speech with high intelligibility and high naturalness by providing text information.The problem of Tibetan speech synthesis is to convert the Tibetan text containing Tibetan information into a more natural Tibetan speech that people can understand.Speech synthesis plays an important role in the field of human-computer interaction.With the development of deep learning technology,speech synthesis technology has been widely used in speech navigation,intelligent interactive equipment,Internet of things,e-book education and other fields.However,due to the small scale of Tibetan corpus and speech database,and the lack of in-depth research on Tibetan intelligent informatization,there is a large space for the research and development of Tibetan speech synthesis compared with English and Putonghua.At present,Tibetan speech synthesis has a series of problems in naturalness and intelligibility.The specific performance is that the prosodic relationship is not obvious and the robustness of synthetic audio is not strong,resulting in low sound quality and poor expressiveness of synthetic speech.Therefore,in order to solve these problems,this paper uses deep learning technology to improve the front-end technology and backend technology of Tibetan speech synthesis.It is mainly reflected in these three aspects:1)Based on the requirements of neural network speech synthesis system in deep learning technology,according to the methods of phoneme balance,Tibetan word frequency analysis,Tibetan word formation method analysis,high-frequency word statistical analysis and so on,a synthetic corpus in line with the characteristics of Amdo Tibetan language is designed,and in a professional recording environment,using professional equipment Record with scientific methods to form a 60 hour Amdo Tibetan speech synthesis speech database.2)Combined with Amdo Tibetan pronunciation dictionary and Willie transcription scheme,a transcription scheme with the characteristics of Amdo dialect is formed and applied to the front end of Tibetan speech synthesis.The phoneme vector containing Amdo dialect information is formed through the transcription scheme,and combined with the Blistm CRF prosody prediction model,the feature expression ability of external information is improved,so as to improve the naturalness of Amdo Tibetan speech synthesis.3)In the end-to-end framework of Tibetan speech synthesis,the ability of word embedding layer to obtain information is improved,its independence is reduced,and the intelligibility and naturalness of Tibetan speech synthesis are improved by combining Amdo Tibetan speech element vector.Using the large-scale Amdo Tibetan speech synthesis library designed by ourselves and through the design of pre-trained model,the effect of Wei Tibetan speech synthesis in low resource environment is improved.Experiments show that this method can effectively improve the naturalness of Tibetan speech synthesis.To sum up,this paper makes an objective and subjective evaluation on the above proposed methods,which shows that the naturalness of Tibetan speech synthesis has been well improved,which is conducive to the development of intelligent speech information processing in Tibetan areas.
Keywords/Search Tags:Tibetanspeech, Speech Synthesis, Speech Synthesis Corpus, Acoustic Modelling, Deep Learning, Prosody Modelling
PDF Full Text Request
Related items