Font Size: a A A

Studies On Function Words Phrase Boundary Identification And Its Application In Syntactic Parsing

Posted on:2017-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X B FengFull Text:PDF
GTID:2348330485986718Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the basic tasks of natural language processing, Syntactic Analysis is the basis of the realization of natural language understanding tasks including semantic understanding, question answering system, etcAccording to the Chinese Function word usage Knowledge Base, this paper preliminarily discussed the application of phrase boundary recognition in Syntactic Analysis, then the usage of function words was applied to several phrase boundary recognition methods; On this basis, A syntactic analysis parser that based on phrase boundary recognition called Phrase_Based Parser was proposed.Experiments on CTB8.0 dataset show that Phrase_Based Parser could correct the wrong parsed trees that caused by some phrase boundary errors, average accuracy rate increased 6.9%. The work of this paper is as follows:1)According to the Chinese Function word usage Knowledge Base, statistically analyzing the the distribution of function words in CTB8.0,the statistic reveals that function word occupies a very large proportion. A CRF model was used to automatically identify and label the function word usage in CTB8.0.The paper syntactic analyzes the Original corpus of CTB 8.0 by Berkeley Parser, the results shows that 40.76% of phrase boundary that contains function words were dis-identified, which means the performance of phrase boundary identification would affect the accuracy of parsing.2)Construction of corpus database that based on CTB8.0. The paper constructed the standard phrase boundary tagging corpus that based on CTB8.0, a set of phrase marker symbols were designed, and a phrase marker treebank that based on CTB8.0 was also built.3)Application of rules that based on function word usage and statistical models in recognition of preposition phrases, conjunctional phrase and and phrases containing particle “?”(de)were studied.The experimental results show that the average accuracy of rules based phrase boundary identification reached 47.06%; Phrase boundary identification based on CRF model reached 73.69%, and the CNN model reached 75.54%.4)A syntactic analysis parser that based on phrase boundary recognition called Phrase_Based Parser was proposed.Experiments on CTB8.0 dataset show that the F score of Phrase_Base Parser is 2.72% higher than Berkeley Parser when dealing with sentences that contain preposition phrase, and the F score of Phrase_Base Parser is 2.72% higher than Berkeley Parser when dealing with sentences that contain conjunctional phrase.5)A syntactic analysis parser that based on the usage of function words called Usage_Based Parser was proposed. the phrase boundary identification results using usage based parser for syntax analysis is put forward. The experimental results show that the accuracy of the syntactic analysis of the preposition "?" the results of the syntactic analysis is 20.69%, and the accuracy of the sentence syntactic analysis of the related phrases is increased by an average of 6.9%.
Keywords/Search Tags:Function Word, Boundary Recognition, CRF, CNN, Syntactic Parsing
PDF Full Text Request
Related items