Font Size: a A A

A Research On The Phrase Structure Based On Corpus

Posted on:2013-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:F M LiFull Text:PDF
GTID:2248330374456476Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese syntactic parsing is an important task in Chinese Information Processing. Syntactic parsing can make the computer understand the grammar structure of the sentence, so as to computer can correctly understand the meaning of language. However the current the performance of Chinese syntactic parsing can’t meet the needs of the application and seriously impact the semantic analysis for the Chinese. At present the method for reference on English model with good performance didn’t get to the corresponding result, so how to combine the characteristics of the Chinese itself improve performance is a research focus. In recent years, the construction of the syntactic treebank based on Chinese grammar promoted the development of the Chinese syntactic parsing. Moreover, some scholars propose the concept of the Event Description Clause (EDC) according to the characteristic of structure. That means a complete sentence can be divided into several simple clauses according to the event. Therefore, the Chinese sentences syntax structure analysis task is redefined and the evaluation related is helded many times. This paper summarizes the task of Chinese syntactic parsing according to the Corpus of Chinese syntactic treebank for the evaluation, realizes that the phrases play a important role in Chinese syntactic parsing, and considers the parsing of phrases structure as the main research contents.In this paper, the phrase structure parsing research focused on how to eliminate the phrasal structure ambiguity. This paper think the main reason of ambiguity in the syntactic analysis model is that the function type of Chinese words and phrases must not accurately reflect the syntactic function according to the theory of standard grammar of the phrase. Therefore, this paper focus on how to accurately determine the syntactic function of the phrase and how to resolve ambiguity based on corpus. First, the statistical data of the corpus make us understand the complexity of the real language. According to statistical information of corpus, this paper takes frequency10, single center, only including the phrase composition for the standard chose553combination patterns which describe91.53%of phrase instance as this object of study. Second, given the opening and the infinite of the phrases, the method of Rule is adopted to accurately decide the syntactic function of the phrases and constrain the inner components. Moreover, all kides of syntactic semantic features can be used for Rule in the form of Complex Feature Set. So the the construction of the Phrase Structure Rule Knowledge Base is the main way of ambiguity resolution in this paper. Reference to the long-term, repeated characteristics of the writing of the Rule content, this paper design a recording mode in the form of table. This paper statistics5781ambiguity format in corpus and achieve the knowledge for ambiguity resolution according to the theory of potential ambiguity. Finally, this paper make a experience for ambiguity resolution on some instances which have ambiguity and achieves a good result. It prove that the method is feasible and effective.
Keywords/Search Tags:Corpus, Phrase Structure Rule, Amibiguity Resolution
PDF Full Text Request
Related items