Font Size: a A A

A Research On Identification Of Chinese Prosodic Phrase Boundary Based On Chinese Chunk

Posted on:2016-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z R FengFull Text:PDF
GTID:2308330482450606Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapidly development of technology, we would rather like to take information from "Picture" and "Sound" than texts. As we can see Speech technology has already applied in many fields in our daily life. It’s no longer a dream that artificial machines could communicate with human beings one day. However the quality of speech synthesis is still need to be improved, mainly in low phonemic fluency of synthesized speech and poor rhythm sensation. Therefore, it’s an urgent problem to improve the phonemic fluency of synthesized speech.To deal with the recognition of prosodic structure, we mainly focus on the hardest predicate part prosodic phrase. Prosodic structure is based on the syntactic structure. According to a large amount of annotated corpus based on prosodic structure, we found that there are certain relationships between prosodic structure and syntactic structure. Due to the Chunk can reflect some syntactic information, and when people are reading or speaking, they often use Chunk as basic unit, the segmentation of Chunk can integrate the syntactically related words. In this paper, we applied non-recursive nesting shallow syntactic structure Chunk to the research of prosodic phrase prediction.The main work of this paper includes the following aspects:(1) The definition and obtain of Chinese ChunkAccording to the previous thesis about identification of prosodic phrase, the majority of them used word, part of speech and length of word as the features. Take the limitation of features, the relationship between prosodic structures and syntactic into consideration, we defined 8 types of Chunk which is based on the comparison between prosodic structure and syntactic structure for prosodic phrase recognition. And summarized the processing rules of Chunk, including the initial annotation of Chunk, after the measurement of the closeness among chunks, merging chunks, we get the final Chunk structure.(2) Identification of prosodic phrase based on conditional random fieldsIn this paper we applied the shallow syntax structure Chunk which is not-recursive nested to predict the prosodic phrase. And proposed a method that combines conditional random fields and Chunk to recognize the prosodic phrase. First, we use CRFs to extract corresponding features from the corpus and get a train model automatically, then put the test corpus into the trained model to identify the prosodic phrase.(3) Identification of prosodic phrase based on AdaBoost algorithm and ChunkAs searching for a strong classification algorithm for identification of prosodic phrase is very difficult, and based on the equivalence property of strong and weak learning algorithms, using the ensemble learning method can make a plurality of classifiers which accurate rate is slightly higher than random guessing weak coalesce together by weight, and formed a strong learning algorithm to achieve the better classification result than strong classifier. Therefore, we used one typical ensemble learning algorithm AdaBoost as the model to identify the prosodic phrase. In order to use AdaBoost algorithm, we use SVM as base classifier. By repeating extract the training corpus more than once randomly, then modification the weight each time. Through a series of training, we can generate plenty of base classifiers, then we used a weighted voting method to merge these base classifiers, and form a new strong classifier to complete the prediction of prosodic phrase.We achieved the prosodic phrase recognition model by using CRFs, CRFs combine Chunk, SVM, SVM combine Chunk, AdaBoost-SVM, AdaBoost-SVM combine Chunk respectively, and compared the performance of various models, especially the performance of whether Chunk is enable to improve the prosodic phrase recognition model. The experimental results show that among the various methods, the performances of the method combination of Chunk are better and improved than the model obviously. This proves that, Chunk of information can be used in the study of the recognition of prosodic structure and make an effective contribution.
Keywords/Search Tags:Speech pause, Prosodic phrase boundary, Chinese Chunk, Conditional Random Fields, AdaBoost algorithm
PDF Full Text Request
Related items