Font Size: a A A

Research On Chinese Preposition Phrase Identification Based On Cascaded Conditional Random Fields

Posted on:2014-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2248330395987137Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Phrase identification is one of the most important subjects of natural languageprocessing. By dividing the sentence into smaller units, the Chinese phrase identification cansimplify the sentence structure and reduce the difficulty of parsing. As one type of theChinese phrases, preposition phrase is the most important and the most frequently used. Theimprovement of performance on Chinese preposition phrase identification will greatlypromote the development of many natural language processing applications such as parsing,machine translation, information retrieval and so on. According to the characteristics ofprepositions and prepositional phrases, this thesis investigates the identification of Chinesepreposition phrase based on statistics mainly in the following points:First, the thesis proposes an identification method which is on the basis of collocation.Based on the features of prepositions and the definition of collocation, preposition phraseidentification is transformed into the collocation identification of preposition itself and theright boundary word, and uses new tags and collocation features to perform reversed andlayered identification. Experimental results show that the approach obtains significantprogress, with the current F value reaching83.41%and it is about4.6%higher than thatpublished in studies, especially in the identification of long distance preposition phrase andnested preposition phrases whose types are difficult to identify.Second, based on Cascaded Conditional Random Fields, the thesis proposes an approachfor Chinese preposition phrase identification. The thesis divides the identification task intothree steps: First, identify the preposition phrase units in a sentence and generate the sentenceframe based on rules. Second, get the parsing information by analyzing the phrase structureof the frame and search for the best parsing result. Third, combine the results of prepositionphrase identification and parsing. The first two steps were learned by using differentConditional Random models and the last step got the combination result. Experiments showthat the analysis results have a definite directive significance to the identification, and that the combination improved the performance of identification.
Keywords/Search Tags:Preposition phrase identification, Collocation, Cascaded conditional randomfields, Phrase structure analysis
PDF Full Text Request
Related items