Font Size: a A A

A Study On Chinese Location Names Recognition Based On Conditional Random Fields

Posted on:2010-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:L MaFull Text:PDF
GTID:2178360302460315Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese location names recognition belongs to the domain of Chinese Named Entity Recognition, which is a basic research in Chinese Natural Language Processing and also it is basis of some NLP tasks, such as machine translation, information retrieval, question answering and so on. Chinese location names are in a large proportion of named entity recognition. As to the characteristics of Chinese location names, nowadays, people are confronted with many difficult problems in the field of Chinese location names recognition. Chinese location names proportion of Chinese named entity.On the basis of the existing research, this paper studys on automatic recognition of Chinese location names with the applying of Conditional Random Fields model , aiming to improve the efficiency of the Chinese location names recognition.The main work of this paper can be summarized as two aspects:First, shortly introduce the Hidden Markov Theory and Maximum entropy Markov Theory, and then introduce the CRFs model that developed from Maximum entropy theory. CRFs model is a relatively better conditional probability model, It doesn't need the assumption of independence which is HMM's main feature, and reduces the label bias problem of MEMM's. Also can obtain the global optimal labeling results using the contextual features.Second, traditional location names recognition method is just using the single level CRFs model which hardly get the long-distance features .To deal with the recognition of non-local relying named entity presents two-tier CRFs : location names recognition problem translate to sequence labeling problem, combining the characters of Chinese location name , firstly, classify the location name features into three parts : local features , non-local features and dictionary features, meanwhile extract location names from training corpus to generate original location names dictionary. Then the characters are used to train the first layer CRFs and test the testing corpus, the result is added into the original location names dictionary. In the second layer CRFs use the non-local features and get the dictionary features through Maximum Matching Method.The main contribution of this thesis is using two-tier CRFs model to get the long-distance features to recognize Chinese location names, solve the label-bias problem, and implement a location names recognition system by effectively taking use of the existing research method.Experimental result show the method of location names recognition based two-tier conditional random fields can improve the Chinese location names recognition efficiently.
Keywords/Search Tags:Natural Language Processing, Named Entity Recognition, Two-tier Model, Conditional Random Fields
PDF Full Text Request
Related items