Font Size: a A A

A Study On Chinese Place Names Recognition

Posted on:2014-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2268330401977058Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is one of the key and basic tasks in natural language processing technology. Chinese place names, as an important part of named entity, has the characteristics of complication and diversity, causing Chinese place names recognition to be a difficulty in natural language processing. On the basis of the intensive study of machine learning model, this paper adopts conditional random field machine learning model to process Chinese place names recognition.Firstly, this paper summarizes the definition of named entity、 Chinese place names recognition、research background、research present situation and then make research and analysis of the current methods. By learning the present main trend methods, this paper decides to use the method of conditional random field to identify the place names. Conditional random field, as an excellent statistical learning method, is not restricted by the independence assumption in hidden markov model and has no marking bias problem in maximum entropy model.The identification of Chinese place names can be converted into the problem of tagging the sequence, consequently, the correct tagging of training sets and testing sets directly affects the recognition performance. Most existing recognition models use ICTCLAS system to segment words in the corpus, as the system has some place name segmentation mistakes, whole recognition performance of the system is low. In order to deal with the above problems, this paper builds a place name lexicon, and add the lexicon into the user lexicon of ICTCLAS system, to ensure the accuracy of place name segmentation.Although Conditional random field is a better machine learning model, slow convergence and long training time have been main problem in practice. So it is especially important to select appropriate and refining feature. Aiming at the complication and diversity of the structure of Chinese place names, based on the previous researches, this paper chooses appropriate characteristics and adopts incremental learning strategy to filter the feature template, and eventually optimizes the recognition performance.The experimental result shows that the novel Chinese place names recognition method has high efficiency. Conduct open test with People’s Daily corpus in1998, and the accuracy, recall ratio and F value are95.34%、89.28%、92.29%respectively.
Keywords/Search Tags:natural language processing, Chinese place names recognition, CRF model
PDF Full Text Request
Related items