Recognition Of Prosodic Phrases Based On An Unlabeled Corpus And "Adhesion" Culling Strategy

Posted on:2017-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Cai

Full Text:PDF

GTID:2348330512951238

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Technology is progressing and times are booming.In the "reading" and "listening"era,Speech synthesis technology is no longer strange and it is not a fantasy that a machine can "speak" by itself today.The articulation of speech synthesis has reached the requirement of people.However the quality of speech synthesis is still need to be improved,mainly in low phonemic fluency of synthesized speech and poor rhythm sensation.Therefore,it's an urgent problem to improve the phonemic fluency of synthesized speech.In the text information processing of speech synthesis technology,the division of prosodic phrase has an important influence on the phonemic fluency of synthesized speech.At present,most of the researches on prosodic structure prediction are based on manually annotated corpus.This kind of corpus is small in size and expanding its scale will be subject to many restrictions.For such a problem,we mainly study the method of prosodic structure prediction based on unlabeled corpus in this paper.Using the pause role of punctuation,this paper proposes a prosodic phrase recognition method which using unlabeled corpus and "adhesion" culling strategy.The main work of this paper includes the following aspects:(1)Division of different levels of punctuation and the obtaining of unlabeled corpusBased on the idea of using punctuation marks to simulate rhythm marks,this paper propose that punctuation should be divided into different levels of treatment,and give each level of different weights according to their different pause times.After many experiences,we find out the best classification method of punctuation marks and the optimal parameter assignment at all levels.After that,we obtain large-scale unlabeled corpus based on the multilevel punctuation marks.(2)The grammatical word "adhesion" based on Mutual informationMutual information in Natural Language Processing is described as a measure of the degree of correlation between the two words.In this paper,we use mutual information to statistics and measure the adjacency of any two part of grammatical words based on the large-scale unlabeled corpus(Only automatic word segmentation and POS tagging are done).according to this,we put the words which are more closely related to each other"adhesion",and call it "adhesion unit".we observe that the words in" adhesion unit "will not be separated by prosodic phrase because they are combined with each other.(3)Automatic recognition of prosodic phrase based on the Maximum Entropy model and words "adhesion" elimination strategyFirst,we construct a Maximum Entropy model based on large-scale unlabeled corpus which is used to automatic prediction of prosodic phrase;we determine the value of parameter K in the Top-K method through the analysis and statistics of the manually labeled corpus.Second,based on the Maximum Entropy model and the Top-K method,a preliminary prediction of prosodic phrase is made after divide the sentences by punctuation marks.Last,we adhesion and tag the corpus to be identified based on the"adhesion" algorithm.According to the annotation results,the initial prediction results of prosodic phrase are eliminated,and the final recognition results are obtained.

Keywords/Search Tags:

Unlabeled corpus, Prosodic phrase boundary, Speech synthesis, Maximum Entropy, Mutual Information

PDF Full Text Request

Related items

1	Chinese Prosodic Phrases Based On Text And Phonetic Features Boundary Prediction
2	The Research Of Prosodic Control Algorithm And Realization For Chinese Speech Synthesis
3	A Research On Identification Of Chinese Prosodic Phrase Boundary Based On Chinese Chunk
4	Research And Implementation Of Chinese Prosodic Structure Prediction Model
5	Speech Synthesis And Speech Processing
6	Research On Predicting Chinese Prosodic Boundary Based On Syntactic Features
7	Studies On Prepositional Phrase Boundary Identification Based On Usage Attribute
8	HMM-based Mandarin Speech Synthesis And Prosodic Optimized
9	Research And Implementation Of End-to-End Prosodic Speech Synthesis System
10	Research On Identification Of Kazakh Basic Noun Phrase Based On Maximum Entropy