Font Size: a A A

Chinese Named Entity Recognition Method Based On Independent Reasoning Research

Posted on:2013-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2248330374471768Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the development of computer technology and its application to change rapidly, people need to face massive news every day.However,people pay more attention to useful knowledge that extracts from massive news, which is also named information.Named Entity Recognition(NER) is inevitable path to deal with news by computer from the syntax structure analysis to semantic analysis. The improvement of recalling rate and precision for NER can provide the foundation and strut to the subsequent application of information technology such as automatic translation,question-answering system and so on.The high demand for information processing to people,the high quality for the result of NER has became basic reqirement for it is an essential condition to application.Based on the characteristic feature of Chinese Named Entity Recognition(CNER),the methods of NER can be categorized mainly into three classes:in the first place,rule-based method,in the second place, statistic-based method,last but not the importent, regulation-based and count-based methods. The applications of concepts in more recent years have combined machine learning with artificial knowledge to process CNER.The paper’s attention is mainly concentrated on three categories included in NER: the methods of identification for the Chinese person, location and organization. According to respective virtues and defects of two traditional methods on probability distribution and rules, and viewing on popular thought at present to integrate machine learning model with artificial knowledge, this paper presents an automatic approach to CNER based on TBL and Hownet.:The basic thinking is:firstly,using label corpus, the context of the Named Entity are tagged with different roles according to their functions in the identification.secondly,distill the templates of instances labelde by role tagging which are divide into high frequency and intermediate frequency,combined with the TBL method and Hownet to distill the fit regulation to the template of intermediate frequency.finally, combine with the rule set and the high frequency template enclosed Hownet to recognize the annotation of text.In this paper, applying Hownet to establish the semantic corpus.Its role is chiefly embodied in following principal three aspects:The first is using positional relationship of Hownet sememes,in order to reduce the number of templates by means of improving the degree of abstraction of the template keyword.The second is making up for the shortage of data sparsity caused by insufficient corpus,To fully exploit the semantic features of the r elated words under the condition of prohibition of expanding the corpus size.The third is using proper sememe existencing in Hownet such as place names, person names, denominational names and others, to establish semantic vocabulary which has an obvious reference effect on NER, Judging named entity types by means of the semantic constraints contained in the dynamic role belong to the role frame of Hownet and statement collocations combined with the backbone component information of syntactic structure analysis. Combination of the template tagged by roles and conversion rules can improve the accuracy of boundary detection of CNER,and Hownet can improve the correct rate of category judgment.Two tests named closure and opening already have carryed out in "people’s Daily" corpus annotated by manpower.Precision rate and recall rate of place names, person names, organization names recognition has achieved good results.The experiment shows that the idea of combining the statistical information with semantic information is feasibility,on the basis of role labeling method integrated with TBL and Hownet.This method model reveals a certain practicality.
Keywords/Search Tags:Chinese Named Entity, Role tagging, Hownet, Exemplar template, TBL
PDF Full Text Request
Related items