Font Size: a A A

A Study On Chinese Location Names Recognition Based On CRF

Posted on:2011-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:W P LiaoFull Text:PDF
GTID:2178330332961517Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese Named Entity Recognition (CNER) is one of the key technologies of Chinese word segmentation system. Named Entity Recognition is the foundation of machine translation, information retrieval, question and answer system. Location names recognition (LNR) is a difficult task in the filed of Named Entity Recognition. This thesis mainly focuses on Chinese location names recognition. We realize one recognition systems which is based on CRF and SVM combining models. Simultaneously, we take a deep research one the method based on CRF model combined with rules.Of the currently used machine learning models, CRF model is one excellent machine learning model. CRFs model combines the advantages of Hidden Markov Model (HMM) and Maximum Entropy Markov Model (MEMM), which can use the contextual features to obtain global optimum tagging result. Therefore it fits the task of CLNR. Based on the previous research, this paper analysis the characteristics of CLN and then select suitable features. In order to improve the performance of machine learning model and highten the results of CRFs, we employ the incremental learning strategy to ensure feature template.The analyzation of marginal probabilities of recognition shows that there exists a relationship between error label and low marginal probability. We are able to find part error labels by low marginal probabilities. This thesis employs SVM to recognize the parts that have low marginal probabilities, which can improve the performance of our system. The experimental results show that this method is better than single CRF model obviously.Part of error labels of CRF don't compliance with the grammar and semantic rules obviously, this is because machine models can not express language certainty. Rule method can overcome the shortage of machine models. Through grammar and semantic analysis and classify the errors of CRFs'recognition, this thesis add some rules manually to correct the results of CRF. Through this method, the results are improved.Experiments show that this approach is effective. An open test implemented on MSRA corpus of the task of BAKEOFF-3 gets the recall, precision and F-value to 92.39%,91.33% and 91.86% respectively with the CRF and SVM combined system, and 94.67%,92.35%, 93.50% respectively with the CRF and rules combined system.
Keywords/Search Tags:Natural Language Processing, Named Entity Recognition, Conditional Random Fields, Rules
PDF Full Text Request
Related items