Font Size: a A A

The Design And Implementation Of Syntax Analysis System Based On Chinese

Posted on:2009-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:G ShaoFull Text:PDF
GTID:2178360245972897Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Parsing is one of the crucial problems in natural language processing. Its main task is to automatically recognize the syntactic structure of sentences. Meanwhile, parsing, a significant subject in the field of Chinese information processing, can promote the development of other related linguistics.The article introduces related theory and technology of syntactic analysis in the natural language processing and does a comparative research on some existed parsing algorithms and models. On the basis of precious research achievements, improvements on the parsing algorithm of traditional Char graph are given by adopting bottom up and top down map of combining method. Meanwhile, the dynamic method is used in the selection of rules, which makes the analysis of efficiency and result more exact. A set of part of speech and phrases marker sets are confirmed according to practical applications. What's more, another syntax rule library based on context-irrespective are designed according to common syntaxes. On this basis, three kinds of modules -participle module based on algorithm of maximal word length, POS marker module based on the method Hidden Markov Model and parsing module based on improved chat analysis algorithm-are developed, which leads to the achievements of a prototype system for Chinese parsing. In addition, to solve the difficulty of Chinese long sentences parsing, the usages and functions of Chinese punctuations are studied and a hierarchical parsing approach is proposed. The approach marks the punctuation while dividing the complex sentences. Based on the dividing rules, long sentence is divided into sentence unit sequence. the first-level independent analysis is made, and the outcome from the first-level analysis is used as the input of second-level analysis, and then the final output becomes the parsing tree by making use of secondary blend.According to the POS marker and syntax rule library mentioned above, the annotated corpus---PFR of People's Daily made by Institute of Computational Linguistics (ICL) of Peking University and TCT973 tree rules are used as training samples , from which some samples are extracted to make a small-scale test of Chinese version. The test proves the improved analysis algorithm and layered long sentence parsing rules practicable and feasible.
Keywords/Search Tags:Chinese syntactic analysis, Chart algorithm, Rules, Part-of-Speech Tagging, Words Segment
PDF Full Text Request
Related items