Font Size: a A A

Research And Application Of Chinese Named Entity Recognition Based On CRF Model

Posted on:2011-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:S F LiFull Text:PDF
GTID:2178360332458117Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In text file, named entities which is the main carrier of information, used to express the main content of the text. Chinese named entity recognition, which is an important basis for application of natural language processing, such as information extraction, summary extraction, parsing, open domain questions and answers, machine translation, has taken more and more concern. Due to restrictions on its own characteristics in Chinese, Chinese named entity recognition has not been very good effect. To raise the effect of Chinese named entity and promote development of other technologies and applications of the field of Chinese information processing, research in Chinese named entity recognition technology is extremely important.There are mainly two ways to Chinese named entity recognition. They are rule-based and statistics-based methods. Method that is used in this paper is statistics. Technology of Chinese named entity recognition which is based on Conditional Random Fields (CRFs) Model is studied mainly. Advantages and disadvantages of several statistical models have been discussed in theory, such as hidden Markov model (HMM), maximum entropy model (ME), maximum entropy Markov model (MEMM) and CRFs. Among them, HMM has quite higher independence, ME is lack of Markov character, MEMM has label bias problem. But CRFs can deal with problems caused by models presenting before well. In addition, in-depth research has been done to CRFs, especially in feature extraction and parameters estimation.For Chinese named entity recognition which is based on CRFs, feature template is a very important influence to the effect of recognition. Basing on previous work, a good feature template which can distinguish each type of named entities well has been proposed by refining of features and experiments. This feature template contains the basic features, the prefix and suffix features, dictionary features and complex features. Meanwhile, the first attempt with extracting features from different proportions of the dictionaries has been done, and greatly improvement of effect of recognition has been get. The final recognition results of the F reached 91.27 percent, even higher than the first SIGHAN bakeoff 2006 evaluation results.In the end of this paper, an online Chinese named entity recognition and hotspot ranking system has been realized by using results of the study of technology of Chinese named entity recognition which is based on CRFs. This system recognizes named entities in documents which are cleared from page text firstly, counts times that each person shows, and then shows first several hotspots after sorting.
Keywords/Search Tags:named entity, CRF, feature, feature extraction, dictionary
PDF Full Text Request
Related items