Font Size: a A A

A Method Combining Rule-Based And Statistics-Based Approaches For Chunk In English And Chinese

Posted on:2007-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:H X YuFull Text:PDF
GTID:2178360185985647Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Full parsing is one of the hardest and important tasks in natural language processing. It is considered to be unresolved. Partial parsing, that is chunking, can reduce the difficulty of full parsing. It analyzes the sentence with a high accuracy in syntactic stage and is used in practice. This paper discusses the definition of chunks, chunk types, and analyzing techniques in the theory and algorithm.Then, we get 10 types of chunks through a series of rules and combing strategies based on the syntactic tags of Upenn Chinese Treebank. We also introduce the details of rules and strategies.This article introduces the Support Vector Machine method in the statistic theory and Transformation-Based learning method in the rule method. We describe the principles, features and characteristics. The support vector machine is much better in the aspects of the function expression, generalization and efficiency than the traditional artificial neural network. It solves the problems: model selection, overfitting, nonlinearity, dimensions, the lowest point in local, and so on. Transformation-Based Learning method can combine many kinds of features and express much knowledge of linguistic which is very important to other research. Using statistical method and rule method can combine the advantages and get a satisfaction of identification.At last, this article sets a series of experiments to test the different features (including word, part of speech and chunk) and the different size of corpus to affect the chunking performance.It implies that the method of SVM+TBL to the corpus of CoNLL2000 and Chinese corpus of our definition. Based on the result , we analyze the shortcoming of the chunk definition, and propose the direction in next step . The result of this research can be applied to many fields of natural language, such as questions classification in QA system, relations assignment, word assignment and statistic machine translation, information extraction, text classification and parsing of Spontaneous Speech.
Keywords/Search Tags:natural language processing, chunking, corpus, Support Vector machine, Transformation-Based Learning
PDF Full Text Request
Related items