The Design And Implementation Of Syntax Analysis System Based On Chinese

Posted on:2009-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:G Shao

Full Text:PDF

GTID:2178360245972897

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Parsing is one of the crucial problems in natural language processing. Its main task is to automatically recognize the syntactic structure of sentences. Meanwhile, parsing, a significant subject in the field of Chinese information processing, can promote the development of other related linguistics.The article introduces related theory and technology of syntactic analysis in the natural language processing and does a comparative research on some existed parsing algorithms and models. On the basis of precious research achievements, improvements on the parsing algorithm of traditional Char graph are given by adopting bottom up and top down map of combining method. Meanwhile, the dynamic method is used in the selection of rules, which makes the analysis of efficiency and result more exact. A set of part of speech and phrases marker sets are confirmed according to practical applications. What's more, another syntax rule library based on context-irrespective are designed according to common syntaxes. On this basis, three kinds of modules -participle module based on algorithm of maximal word length, POS marker module based on the method Hidden Markov Model and parsing module based on improved chat analysis algorithm-are developed, which leads to the achievements of a prototype system for Chinese parsing. In addition, to solve the difficulty of Chinese long sentences parsing, the usages and functions of Chinese punctuations are studied and a hierarchical parsing approach is proposed. The approach marks the punctuation while dividing the complex sentences. Based on the dividing rules, long sentence is divided into sentence unit sequence. the first-level independent analysis is made, and the outcome from the first-level analysis is used as the input of second-level analysis, and then the final output becomes the parsing tree by making use of secondary blend.According to the POS marker and syntax rule library mentioned above, the annotated corpus---PFR of People's Daily made by Institute of Computational Linguistics (ICL) of Peking University and TCT973 tree rules are used as training samples , from which some samples are extracted to make a small-scale test of Chinese version. The test proves the improved analysis algorithm and layered long sentence parsing rules practicable and feasible.

Keywords/Search Tags:

Chinese syntactic analysis, Chart algorithm, Rules, Part-of-Speech Tagging, Words Segment

PDF Full Text Request

Related items

1	The Design And Implementation Of Syntax Analysis System Based On Chart Algorithm
2	The Study Of Rule-based Chinese Words Tagging Method
3	Research On Improvements Of Chinese Part-of-Speech Tagging System Based On Statistical Model
4	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
5	Research And Implementation Of Modify Chinese Part-of-Speech Tagging Based On FST Technology
6	Chinese Part-of-Speech Tagging Based On Ameliorated Hidden Makov Model
7	Chinese POS Tagging Based On Maximum Entropy
8	Statistics-based Chinese Pos Tagging Method
9	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
10	Chinese Word Found Its Part Of Speech Tagging