Font Size: a A A

The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging

Posted on:2009-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YuFull Text:PDF
GTID:2178360242974992Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As rapid progress of information technology people hope to intercourse with computer in natural language as human use. Natural language understanding is an interesting and challenging task. From the view of computer science especially artificial intelligence, the task of natural language understanding is to build one computer model which can understand ,analyze and answer question as human usually do.Chinese natural language processing is the core technology in enable computer to understand Chinese. The Chinese syntax parsing is an important problem of the domain of Chinese information processing, which can also promote the development of other related linguistics.The kernel work of this article can be generalized to three aspects as follows:(1) This paper introduces significance of it on natural language the rules of Maximum Entropy and the understanding research. Furchermore, this dissertation discusses the definition of Condition Random Fields heavily motivated by the principle of maximum entropy. Condition Random Fields model relaxes the strong independence assumptions which generative model must have, such as Hidden Mtirkov Model, and overcomes the label-bias problem exhibited by Maximum Entropy Markov Model and other non-generative models.(2) Comparisons and synthesis are drawn from some existed algorithms and models about the Chinese word segmentation and Part-Of-Speech Tagging. Based on the existed research theories, compared to traditional several kind of models, and used Chinese word segmentation method based on the condition random field, which have enhanced the precision of analysis.(3) According to the peculiarity of Chinese word segmentation and the feature used in the Condition Random Fields, determined a set of characteristic template based on Condition Random Fields and expounded the word segmentation statistics about ambiguity words and undocumented words in particular. We analyzed, designed and achieved a module of Chinese word segmentation and Part-Of-Speech Tagging based on Condition Random Fields model.
Keywords/Search Tags:Natural language processing, Chinese Word Segmentation, Part-Of-Speech Tagging, Condition Random Fields
PDF Full Text Request
Related items