Font Size: a A A

A Study On Chinese Chunk Parsing

Posted on:2008-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:X B LuoFull Text:PDF
GTID:2178360242967599Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Syntactic parsing is an important and difficult task in the natural language processing (NLP). Because of the difficulties of complete syntactic parsing, chunk parsing has become an interesting alternative to full parsing. Using the divide-and-conquer strategy, syntactic parsing is divided into two sub-tasks, chunk parsing and the relationships analysis. The main goal of this paper is to implement Chinese chunk parsing task based on Morpho-Analysis, and provide the basis for complete syntactic parsing and other NLP tasks.In this paper, it first introduces the current research state of the chunk parsing and its significance.We give the definition and class of Chinese chunks. Three systems for chunk parsing are built based on the Specialized Hidden Markov Model, Support Vector Machine Model, and Conditional Random Fields. In addition, feature extension and voting among three baseline systems improve the result of chunk parsing.Through the analysis of the error information from chunks which have been tagged by three baseline systems, we find that special terms, coordination and simple POS (part-of-speech) tags are diffcult problems in chunks recognition. So a Feature extension method is proposed. For higher accuracy, we also propose a novel voting method.The experimental results show that the accuracies and recalls of three baseline systems are satisfactory. The F-value of Specialized HMM is 86.01%, SVM chunking is 90.89% and CRF is 91.08%. After combining feature extension and voting method, The F-value of chunking is 91.39.The chunk parsing approaches introduced in this paper could be used in actual MT system, which can simplify sentences' structure and improve the holistic performance. In addition, the research of this paper would also be applied to other NLP tasks, such as information retrieval, text classification and so on.
Keywords/Search Tags:Natural Language Processing, Chunk Parsing, Specialized HMM, Support Vector Machine, Conditional Random Fields
PDF Full Text Request
Related items