A Study On The Computation Of Chinese Chunks

Posted on:2003-04-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S J Li

Full Text:PDF

GTID:1118360185496951

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The concept of"chunk"was proposed in the science of cognitive psychology, which was later applied in the field of information processing theory and systems of general intelligence. Now it was spread to the field of Compuational Linguistics, using the"divide-and-conquer"strategy to conduct chunking. In this paper, the computation of chunks not only includes chunk parsing, but also refers to the computation of similarity between chunks.As a key problem of Natural Language Processing, the problems of complete syntactic parsing aren't solved yet. Due to this situation, chunking is used to reduce the difficulty of complete syntactic parsing. Chunking refers to the techniques of recognising relatively simple syntactic structures. The thesis aims to discuss the methods and techniques of chunk parsing.At first, we point out the difficulties of syntactic parsing and think that chunk parsing is one way to solve this problem. At the same time, the current state of chunk parsing is introduced, and the rule-based and statistical techniques are also illustrated. It is obvious that the task of chunk parsing is important and feasible.Then we summarize the current definitions of chunks. Based on the work of others, we make a definition for Chinese chunks. It is laborious to collect the corpus with chunk tags, and thus its acquisition is mostly carried out through the transformation of the existing treebank. The train and test data in our paper is extracted from Upenn Chinese Treebank. According to the definition of chunk and the practice of corpus available, 12 Chinse chunk categories are introduced, with the chunk tags used in the process of chunk tagging.The system of text chunking in this thesis adopts a hybrid model, which combines rule-based method and statistical method. The first time we utilize the mature statistical modeling techniqueâ€“ Maximum Entropy (ME) model to conduct the division and recognition of Chinese chunks. In practice, using ME model we can reach high accuracy with knowledge-poor features. Another advantage of ME model is its reusability and the theory of ME framework is independent of any particular natural langugage task. As for rule-based modeling techniques, Finite-state automaton (FSA) is used with high efficiency due to its definiteness. At the same time, transformation-based error driven machine learning method is also added to improve our system. This machine learning method compares the tagging results of those two methods above with the correct result, and produces a set of transformation rules through learning and feedback.The selection of features is a key problem of ME model which determines the performance of text chunking. Aiming at the task of text chunking, we proposed that word, part of speech, syntactic tagging and rhythm are the main factors which construct a feature...

Keywords/Search Tags:

Natural Language Processing, Syntactic Parsing, Chunk parsing, Maximum Entropy Principle, Finite State Automaton

PDF Full Text Request

Related items

1	Research On Chinese Parsing Based On Semantic Analysis And Its Implementation
2	Research On History-based Chinese Hierarchical Parsing
3	A Study On Chinese Chunk Parsing
4	A Study On Chinese Chunk Parsing
5	A Study On Chinese Chunk Parsing
6	Research On Natural Language Syntactic Parsing Based On Deep Learning
7	Chinese Chunk Identification Research And Application
8	Research On Joint Syntactic And Semantic Parsing For Chinese
9	Chunk Based Chinese Syntactic Parsing And Its Application
10	Research On Chinese Syntactic Parsing Based On SEARN Algorithm