Font Size: a A A

Chinese Chunk Identification Research And Application

Posted on:2010-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q M XiaoFull Text:PDF
GTID:2178360302460426Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of society and information technology, natural language processing has been increasingly important. Complete syntactic parsing is a key but difficult point of natural language processing, so a kind of shallow parsing has been proposed to simplify the complete parsing. As one of the main aspects of shallow parsing, chunk parsing can simplify the sentence structure by dividing a sentence into smaller units, and continually prepare for many natural language processing tasks, such as deeper level of parsing, chunk alignment, and so on.In this paper, a distributed approach based on CRFs and error-driven technology is used to implement the Chinese chunk parsing. Firstly, divide the chunks into groups, with the method of divide-and-conquer, then select appropriate single and combined features for each group respectively, and implement chunk recognition based CRFs; Results of the former recognition are added to the second identification template. When merging the recogonition results and dealing with the recognition conflict, the F value and the number factor of each group are comprehensively considered to determine the priority for the merger, thereby the overall recognition results are further enhanced, particularly effective for noun chunk recognition.Finally, a new algorithm based on the window matching is proposed to implement the noun chunk alignment on the basis of noun chunk parsing. Regard the nown phrases's translation obtained by word alignment with the software of GAZA++ as the initial translation, then use bilingual phrase-dictionary and words dictionary to revise the translations or find another ones.Experiment results show that CRFs-based distributed strategy, compared with using CRFs singlely, can shorten the training time and improve the recognition results. By adding an error-driven technology, the recognition results have been further improved, and the open testing F-value is 92.23%. Noun chunk alignment has also achieved good result, and the accuracy achieves 88.01%.
Keywords/Search Tags:Natural Language Processing, Chunk Parsing, Chunk Alignment, Distributed Approach, Conditional Random Fields
PDF Full Text Request
Related items