Font Size: a A A

The Research On The Technology Of Chunk Recognition And Its Implementation

Posted on:2007-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZouFull Text:PDF
GTID:2178360215470248Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The problem of complete syntactic parsing is a key problem of Natural Language Processing. In order to reduce the difficulty of complete syntactic parsing,"divided-and-conquer"is proposed and shallow parsing is processed. Shallow parsing is called chunk parsing also. In the field of natural language processing, chunk parsing is of great significance for syntactic analysis, machine translation and information retrieval, etc. The thesis aims to discuss the methods and techniques of chunk parsing.At first, we analyze the difficulties of syntactic parsing and the importance of chunk parsing. The current state of chunk parsing is introduced, and the rule-based and statistical techniques are also illustrated.Next, we summarize the current definition of chunks. As one of the targets of the paper is to prepare chunks for building chunk-aligned English-Chinese corpus, based on the work of others and the standard of English chunk in CoNLL-2000, we proposed a definition of chunk consisting of 5 chunk categories according to their syntactical functionality. The training and testing corpus in the paper is transformed from Chinese and English chunk banks which have been extracted from Upenn Chinese Treebank and Upenn Treebank according to the definition and classification of chunk in this paper.Support Vector Machine (SVM) is a kind of new general learning machine based on statistical learning theory. It is well known that SVM is superior to traditional methods especially in the situations of small example scale, nonlinearity, and high feature dimension. The paper presents a method of Chunk Recognition based on SVM and Transformation-based Error-driven learning. The transformation-based learning approach with supervision is further applied to improve the analysis result of SVM. This machine learning method compares the result of SVM above with the correct result, and produces a set of transformation rules through learning and feedback. Transformation rules effectively deal with the specificity of language and improve the performance of SVM. The experiments show encouraging results.The paper presents different experiments to analyze the factors affecting the performance of SVM. These factors include definition of chunk, feature selection in feature vector and scale of training corpus. The conclusions from these experiments are expected to be helpful for the research of chunk recognition.
Keywords/Search Tags:syntax parsing, shallow parsing, support vector machines(SVM), transformation-based error-driven learning, chunk recognition
PDF Full Text Request
Related items