Font Size: a A A

The Research On Personal Name Disambiguation And Character Relationship Extraction Merging Sentential Semantic Feature

Posted on:2016-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2298330452464870Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Name ambiguity is a kind of identity of uncertain phenomenon, referring to the text ofthe same name to different entities in the real world. Name disambiguation has greatpractical application value, whichmakes great influence in the search engine, socialnetwork and knowledge base building names for basic research.In the personalized search,automatic question answering, multi-document summarization, hot figure tracking anddiscovery, and other fields have a wide range of applications. Through personal namedisambiguation,gettingtextrelated to interesting characters by disambiguating personalname, people often concern certain relation with the characters. Only the identifyingcharacters are often unable to meet the demand of practical applicationin the text, there ismore crucial because of existing what kind of relationship between the characters. Becausethe relationship between the characters in the text is dispersed, we need more rapidly andaccurately from the text automatically to extract characters relations.Cross-document personal name disambiguation is to distinguish the more differentcharacters with the same entity in the text of the process, which is an important part of thename retrieval technology.In recent years it will have become a key problem in naturallanguage processing. This paperdirecting text analysisis not enough in-depth.For enoughfine information loss and noise are caused by the influence of the problem, a multi-stagedisambiguation algorithm is presented. Firstly, according to the characteristics of queryterms acting as common terms, heuristic rule is applied to determining if the query term ispersonal name after the pre-processing documents; Then named entity and occupation areextracted according to the feature templates, and sentential semantic model is used forsentential semantic analysis and sentential semantic features extraction, the word frequencyis counted according to the bag-of-words model. Finally make up the three layers of featurespace, and use rule-based classification and two-stage hierarchical clustering algorithm torealize the name disambiguation. Experiments datasets are built by CLP2012ChinesePersonal Name disambiguation show that F value achieves88.79%, introducing insentential semantic feature further can enhance the treatment effect of personal namedisambiguation.Relationships often are not expressed in the form of a structured, character extraction algorithm of automatic efficient demand is urgent, for triple feature analysis is not enoughin-depth, and seed word need to manually set, in to extract and analyze the relationship ofkey words of the sentential mission, on the basis of an automatic character relationshipextraction algorithm is proposed.This method first make use of statistical characteristics ofword frequency and Bootstrapping algorithm, respectively, in a small amount of markedcorpus and a lot of unmarked obtained relations training features from the corpus ofdictionary; Then by tectonic elements distance optimization rules statement instances oftriple groups, fusion lexical layer and the words of sentential semantic feature tectonictriple group feature space; Finally is a binary decision to triple group, confidencemaximization principle are used to get the character relation category. Experiments datasetsare BFS popular character retrieval corpus, the results show that the method of F valuereached83.8%, the experimental effect is good.
Keywords/Search Tags:personal name disambiguation, character relationship extraction, sententialsemantic feature, natural language processing
PDF Full Text Request
Related items