Chinese Chunk Identification Research And Application

Posted on:2010-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q M Xiao

Full Text:PDF

GTID:2178360302460426

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of society and information technology, natural language processing has been increasingly important. Complete syntactic parsing is a key but difficult point of natural language processing, so a kind of shallow parsing has been proposed to simplify the complete parsing. As one of the main aspects of shallow parsing, chunk parsing can simplify the sentence structure by dividing a sentence into smaller units, and continually prepare for many natural language processing tasks, such as deeper level of parsing, chunk alignment, and so on.In this paper, a distributed approach based on CRFs and error-driven technology is used to implement the Chinese chunk parsing. Firstly, divide the chunks into groups, with the method of divide-and-conquer, then select appropriate single and combined features for each group respectively, and implement chunk recognition based CRFs; Results of the former recognition are added to the second identification template. When merging the recogonition results and dealing with the recognition conflict, the F value and the number factor of each group are comprehensively considered to determine the priority for the merger, thereby the overall recognition results are further enhanced, particularly effective for noun chunk recognition.Finally, a new algorithm based on the window matching is proposed to implement the noun chunk alignment on the basis of noun chunk parsing. Regard the nown phrases's translation obtained by word alignment with the software of GAZA++ as the initial translation, then use bilingual phrase-dictionary and words dictionary to revise the translations or find another ones.Experiment results show that CRFs-based distributed strategy, compared with using CRFs singlely, can shorten the training time and improve the recognition results. By adding an error-driven technology, the recognition results have been further improved, and the open testing F-value is 92.23%. Noun chunk alignment has also achieved good result, and the accuracy achieves 88.01%.

Keywords/Search Tags:

Natural Language Processing, Chunk Parsing, Chunk Alignment, Distributed Approach, Conditional Random Fields

PDF Full Text Request

Related items

1	A Study On Chinese Chunk Parsing
2	A Study On Chinese Chunk Parsing
3	The Research Of Applying Conditional Random Fields To Chinese Lexical Analysis And Chunk Parsing
4	A Study On Chinese Chunk Parsing
5	A Study On Chinese Functional Chunk Parsing
6	Multi-Task Learning In Conditional Random Fields For Chunking In Shallow Semantic Parsing
7	A Study On The Computation Of Chinese Chunks
8	Research On Chinese Parsing Based On Semantic Analysis And Its Implementation
9	Study Of Chunk System Oriented To Sentence Parsing
10	A Research On Identification Of Chinese Prosodic Phrase Boundary Based On Chinese Chunk