Font Size: a A A

Research On Chinese Syntactic Parsing Based On Cascaded Conditional Random Fields

Posted on:2011-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2178360302988551Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Chinese parsing is the key technology of Chinese information processing, and is the foundation of the deep analysis of Chinese. The improvement of parsing will give an enormous impetus to natural language processing applications such as machine translation, information retrieval, information extraction and so on. Chinese parsing techniques can not satisfy the requirement of Chinese information processing, so the research on Chinese parsing is of great significance.This thesis proposes a statistical parsing method based on maximal noun phrase pre-processing, which separates the recognition and parsing of the Maximal Noun Phrase from the syntactic parsing; meanwhile, we investigated the techniques of Chinese parsing based on statistical learning methods. The thesis has conducted the following researches:The first work is maximal noun phrase analysis. The maximal noun phrase is the most important kind of NP whose identification and analysis can help people understand sentences of natural language. Because traditional parsing methods are not good at processing maximal noun phrases, this thesis solves the problem specially to reduce the complexity of parsing. As to the characteristics of maximal noun phrase a method based on chunk parsing is presented to mitigate the effect of maximal noun phrases on the parsing.Secondly, a method based on Cascaded Conditional Random Fields is proposed for Chinese parsing. Rather than identifying all phrases by a uniform model, we divide the process into two stages: First, identify the syntactic units in a sentence, and second, analyze the relations among the syntactic units. In this way, the sub-problems can be solved by selecting different models and searching strategies, which also reduce the parsing difficulty. The thesis selects Cascaded Conditional Random Fields as the multi-layer model.Finally, a parsing algorithm based on local optimization for decoding the sentence is proposed. The algorithm utilizes width-first strategy to search a local optimal solution, which effectively ease the error spread problem in traditional deterministic parsing algorithm.Several experiments are carried out on the data set of CIPS-ParsEval 2009. The analysis of experimental results shows that the method based on Cascaded Conditional Random Fields for Chinese parsing obviously improves the accuracy and recall rate of parsing and effectively reduces the complexity of Chinese parsing so that the system can process the text more quickly.
Keywords/Search Tags:Chinese parsing, Maximal noun phrase, Cascaded conditional random fields, Phrase structure grammar
PDF Full Text Request
Related items