Font Size: a A A

A Study On Chinese Chunk Parsing

Posted on:2009-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2178360272470496Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chunk parsing is one of the most important tasks in the shallow parsing of Natural Language Processing (NLP). Using the divide-and-conquer strategy, the sentence is divided into smaller units, which can help simplify sentence structure and provide the basis for making the units' syntactic relationship clearer. As a high determinacy partial analysis result, chunk parsing is useful for solving the ambiguity problem of machine translation. Chunk parsing also has important application values in the fields of information retrieval, information extraction, text classification and speech recognition.The main goal of this paper is to implement Chinese chunk parsing task based on Morpho-Analysis, and provide the basis for complete syntactic parsing and other NLP tasks. In this paper, we do the research work on the recognition of Chinese chunks with Conditional Random Fields(CRFs), and the automatic recognition method of Chinese chunking based on distributed strategy and Error-driven technique with Conditional Random Fields(CRFs) is proposed.For surmounting the disadvantage of that one model couldn't comprise each phrase's characteristics and same features are not suitable to all phrases, we proposes distributed approach based on CRFs, that is to say,eleven types of chunks are divided into groups to build different models with CRFs and each model chooses sensitive features for itself chunking. Then, error-driven technique based on CRFs is adopted to chunk again for improving the chunk parsing results, it uses the results in the first stage of chunking with CRFs as the features in the second stage to learn the error disciplines and correct the errors for the second chunking. Finally, via analyzing the effects of coordinate construction for chunking errors, we use the advantages of CRFs in choosing features freely, and extract coordinate construction context information as features to improve the chunk parsing results further.The experimental results show that the approach of distributed strategy with Conditional Random Fields(CRFs) and Error-driven technique for Chinese chunking is effective and outperforms the single CRFs-based approach, and other hybrid approaches. In the open test, the recall, precision, and F-measure respectively reaches 95.52%, 91.21% and 93.32%.The chunk parsing approaches introduced in this paper could be used in actual MT system, which can simplify sentences' structure and improve the holistic performance. In addition, the research of this paper would also be applied to other NLP tasks, such as information retrieval, text classification and so on.
Keywords/Search Tags:Natural Language Processing, Chunk Parsing, Conditional Random Fields, Distributed strategy
PDF Full Text Request
Related items