Font Size: a A A

Embedded Speech Synthesis Based On Initial And Final Units

Posted on:2017-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:B J LiFull Text:PDF
GTID:2308330482987298Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence, the embedded speech synthesis system becomes the most natural way for human-computer interaction and has broad application prospects. At present, due to the high requirements of computing speed and storage capacity on large-scale speech synthesis, one type of embedded speech synthesis devices is based on Internet and cloud computing, which cannot be used offline; the other type uses a voice chip, which can carry out some simple speech synthesis in offline state although the usage is limited. In addition, due to the workload of building large-scale corpus, how to make corpus customized has become a challenge.In view of problems resulting from large-scale corpus for speech synthesis, the mainstream ideology of large-scale corpus was not adopted and the vowels were used as the basic synthesis units in this paper. In this way, initials and finals were segmented and screened from continuous speech, and then, only moderate phonological samples were reserved in the corpus. Such a treatment had a great advantage on the storage space, matching and custom personalized corpus. The following three aspects of work were implemented in this paper:(1) The initial and final segmentation in continuous speech:A method was put forward in this paper, which was based on the detection of voiced sound, rules of the initial segment length and boundary characteristics of auditory spectrum. First, the autocorrelation function and cost function were established to ensure the dynamic programming on voiced sound detection. And then, according to the statistical rules of the initial segment length, the boundary-characteristic-mutation point of auditory detection spectrum was detected within the scope. Finally, the initial and final were segmented. The experimental results showed that the method of this paper improved the segmentation accuracy. At the same time, the influence of the initial consonant pronunciation, phonetic change and noise on the segmentation was avoided.(2) Establishment of the initial and final corpus:The sample models of the initials and finals were designed in the corpus. First, the rules of coarticulation in different initials and finals combinations were concluded. According to the conclusions, the initials and finals were classified and content of the corpus was designed. After that, the initials and finals which needed in the corpus were extracted from continuous speech. And then, manual proofreading was made. At last, these corrected units were unified and named to build an initial and final corpus.(3) The initial and final speech synthesis:An improvement on TD-PSOLA method was applied to the initial and final speech synthesis. First, the synthetic prosodic modified parameters were introduced in this paper. On the basis of syllable synthesis, the word rhythm models were designed, and moreover, a set of prosodic symbols were designed as supplementary for more complex pronunciations. The experimental results showed that the speech synthesis method based on initials and finals in this paper had a high accuracy and naturalness.
Keywords/Search Tags:Speech Synthesis, TD-PSOLA, Word Rhythm Model, Initial and Final Segmentation, Corpus Establishment
PDF Full Text Request
Related items