Font Size: a A A

A Study On The Computation Of Chinese Chunks

Posted on:2003-04-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J LiFull Text:PDF
GTID:1118360185496951Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The concept of"chunk"was proposed in the science of cognitive psychology, which was later applied in the field of information processing theory and systems of general intelligence. Now it was spread to the field of Compuational Linguistics, using the"divide-and-conquer"strategy to conduct chunking. In this paper, the computation of chunks not only includes chunk parsing, but also refers to the computation of similarity between chunks.As a key problem of Natural Language Processing, the problems of complete syntactic parsing aren't solved yet. Due to this situation, chunking is used to reduce the difficulty of complete syntactic parsing. Chunking refers to the techniques of recognising relatively simple syntactic structures. The thesis aims to discuss the methods and techniques of chunk parsing.At first, we point out the difficulties of syntactic parsing and think that chunk parsing is one way to solve this problem. At the same time, the current state of chunk parsing is introduced, and the rule-based and statistical techniques are also illustrated. It is obvious that the task of chunk parsing is important and feasible.Then we summarize the current definitions of chunks. Based on the work of others, we make a definition for Chinese chunks. It is laborious to collect the corpus with chunk tags, and thus its acquisition is mostly carried out through the transformation of the existing treebank. The train and test data in our paper is extracted from Upenn Chinese Treebank. According to the definition of chunk and the practice of corpus available, 12 Chinse chunk categories are introduced, with the chunk tags used in the process of chunk tagging.The system of text chunking in this thesis adopts a hybrid model, which combines rule-based method and statistical method. The first time we utilize the mature statistical modeling technique– Maximum Entropy (ME) model to conduct the division and recognition of Chinese chunks. In practice, using ME model we can reach high accuracy with knowledge-poor features. Another advantage of ME model is its reusability and the theory of ME framework is independent of any particular natural langugage task. As for rule-based modeling techniques, Finite-state automaton (FSA) is used with high efficiency due to its definiteness. At the same time, transformation-based error driven machine learning method is also added to improve our system. This machine learning method compares the tagging results of those two methods above with the correct result, and produces a set of transformation rules through learning and feedback.The selection of features is a key problem of ME model which determines the performance of text chunking. Aiming at the task of text chunking, we proposed that word, part of speech, syntactic tagging and rhythm are the main factors which construct a feature...
Keywords/Search Tags:Natural Language Processing, Syntactic Parsing, Chunk parsing, Maximum Entropy Principle, Finite State Automaton
PDF Full Text Request
Related items