Font Size: a A A

Study On Automatic Construction Of Speech Database~2

Posted on:2011-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:M H PangFull Text:PDF
GTID:2178330332464803Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of speech synthesis technology, speech synthesis system begins to large-scale application. Especially, with the gradual improvement of the quality of synthesis speech trained by Trainable TTS, and the characteristics of low storage space of Trainable TTS, which is especially suitable for embedded speech synthesis, the industrial development of the speech synthesis system is greatly promoted. In this background, the speech database of speech synthesis system has been put forward higher requirements, particularly in the diversified application of the speech synthesis. In the context of different application, such as, the different accent of different region, speech synthesis systems often need to be reconstruction. The traditional construction method used to construct the speech database of the speech synthesis system is artificial, which has many shortcomings, such as, a long construction cycle, unsatisfactory consistency, and resources-consuming, and the voice build from such speech database is lack of expressiveness.Based on such background, it is a higher academic value and using value to study on the construction of the speech database, which is trained automatically in a short time with minimum manual intervention and meets the requirement of the diversied speech synthesis. Therefore, this thesis studies the topic of the automatic construction of speech database for HMM-based Trainable speech synthesis system deeply and systematically, including the framework of construction, the key technology and the related application. The detailed research works and results are as follows:(1) Proposed a music detection method based on audio classification algorithm which is used to remove audio files with music, and reserve pure speech audio. The audio classification algorithm based on Gaussian mixture model (GMM) and variable duration hidden Markov model (VDHMM). Firstly, the algorithm classifies each frame of audio through the Gaussian mixture model, and then combines the classified frame into a section based on the Maximum likelihood value criteria througn the Viterbi algorithm of variable duration hidden Markov model.(2) Proposed an automatic sentence segmentation algorithm. Firstly, the algorithm trains Phoneme Hidden Markov Model through flat-start approach based on HMM. Secondly, it aligns phoneme sequences with the text through forced-alignment technology. Thirdly, it segments the multi-paragraph into sentences depending on the terminator of the sentence, such as, full stop mark, question mark, exclamation mark, etc. Finally, it judges the correctness of the terminator through the checking mechanism, and then the correct sentences are obtained.(3) Proposed an improved algorithm on sentence segmentation, which has a higher accuracy of sentence segmentation and gets more correct sentences. The improved algorithm is based on an iterative process. The algorithm contains:1) segmenting multi-paragraph speech into paragraph speech and sentence speech depending on the correct terminator of the sentence,2) training more accurate phoneme HMMs using result sub-paragraphs and sentences, and aligning phoneme sequences with the text of the result sub-paragraphs and sentences through the forced-alignment technology,3) segmenting this paragraph speech and sentence speech depending on the correct terminator. And do 1-3 repeatedly, until no more sub-paragraphs and sentences is cut out.(4) Proposed a method of corpus construction under limited text which is based on Okapi formula.Experiments show that our methods could construct speech corpus from broadcast report in a short-time with minimum manual intervention.
Keywords/Search Tags:speech synthesis, speech database, sentence segmentation, audio classification
PDF Full Text Request
Related items