A Study On Chinese Chunk Parsing

Posted on:2009-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:J Yu

Full Text:PDF

GTID:2178360272470496

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Chunk parsing is one of the most important tasks in the shallow parsing of Natural Language Processing (NLP). Using the divide-and-conquer strategy, the sentence is divided into smaller units, which can help simplify sentence structure and provide the basis for making the units' syntactic relationship clearer. As a high determinacy partial analysis result, chunk parsing is useful for solving the ambiguity problem of machine translation. Chunk parsing also has important application values in the fields of information retrieval, information extraction, text classification and speech recognition.The main goal of this paper is to implement Chinese chunk parsing task based on Morpho-Analysis, and provide the basis for complete syntactic parsing and other NLP tasks. In this paper, we do the research work on the recognition of Chinese chunks with Conditional Random Fields(CRFs), and the automatic recognition method of Chinese chunking based on distributed strategy and Error-driven technique with Conditional Random Fields(CRFs) is proposed.For surmounting the disadvantage of that one model couldn't comprise each phrase's characteristics and same features are not suitable to all phrases, we proposes distributed approach based on CRFs, that is to say,eleven types of chunks are divided into groups to build different models with CRFs and each model chooses sensitive features for itself chunking. Then, error-driven technique based on CRFs is adopted to chunk again for improving the chunk parsing results, it uses the results in the first stage of chunking with CRFs as the features in the second stage to learn the error disciplines and correct the errors for the second chunking. Finally, via analyzing the effects of coordinate construction for chunking errors, we use the advantages of CRFs in choosing features freely, and extract coordinate construction context information as features to improve the chunk parsing results further.The experimental results show that the approach of distributed strategy with Conditional Random Fields(CRFs) and Error-driven technique for Chinese chunking is effective and outperforms the single CRFs-based approach, and other hybrid approaches. In the open test, the recall, precision, and F-measure respectively reaches 95.52%, 91.21% and 93.32%.The chunk parsing approaches introduced in this paper could be used in actual MT system, which can simplify sentences' structure and improve the holistic performance. In addition, the research of this paper would also be applied to other NLP tasks, such as information retrieval, text classification and so on.

Keywords/Search Tags:

Natural Language Processing, Chunk Parsing, Conditional Random Fields, Distributed strategy

PDF Full Text Request

Related items

1	Chinese Chunk Identification Research And Application
2	A Study On Chinese Chunk Parsing
3	The Research Of Applying Conditional Random Fields To Chinese Lexical Analysis And Chunk Parsing
4	A Study On Chinese Chunk Parsing
5	A Study On Chinese Location Names Recognition Based On Conditional Random Fields
6	Research On Short Utterance Semantic Recognition Method Based On Cascaded Conditional Random Fields
7	Multi-Task Learning In Conditional Random Fields For Chunking In Shallow Semantic Parsing
8	A Study On Chinese Functional Chunk Parsing
9	Research And Application Of Chinese Word Segmentation Based On Conditional Random Fields
10	Research On Morpheme Analysis Based On Conditional Random Fields In Chinese Natural Language Understanding