Font Size: a A A

Research And Implementation Of Key Techniques Of Personal Name Disambiguation

Posted on:2013-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2268330392967957Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the mobile Internet era, the convenience of using Internet isimproving and the number of terminals is increasing, which leads to rapid growth ofspeed and frequency of information publishing. Searching for information related toa specific character is one of the main purposes of information searching behaviorson the Internet. the universality of persons sharing same names leads to seriousreference ambiguities in web texts. General-purpose search engines return resultswith ambiguities and cannot organize information effectively. That makes the u sersspend a lot of time in figuring the person of interest from the many other charactersof the same name. Moreover, the users are at the risk of losing of importantinformation. Therefore, It is a very important issue to eliminate these ambiguitieseffectively and present the characters’ information to the users in an organized form.To this end, this thesis worked in the following4aspects:First, this thesis discusses the process of artificial annotation process of nameambiguity corpus, and a two-stage Adaptive Resonance Theory(ART) baseddisambiguation strategy is proposed to imitate this process: in the first stage,categories represent persons are constructed and documents are classified; in thesecond stage, similar sets are merged by a hierarchical clustering method. oursystem constructs the target concept sets and eliminates the ambiguities by realizinga human-like behavior. Experiments are designed to verify the validity of thetwo-stage strategy of disambiguation. In comparison with the agglomerativeclustering method, our strategy improve the performance by respectively0.92%and5.00%on two kinds of name recognition results.Second, this thesis realized the man-machine mutual aid system to assisted theestablishment of the rules of recognition and a variety of knowledge dictionaryresources. These resources and rules are used in recognition system. By comparingthe rule based method with other named entity recognition tools ISLEX and LTP,this thesis verify the validity and efficiency of rule based method, and prove themethod is practical in the name disambiguation system.Third, this thesis annotate the Sogou News corpus to build the corpus resources of real Internet data. The importance of the characters’ attributes of the Internetcorpus and the characteristics of the properties are explained. Experiments on theInternet corpus verified the validity of the attribute feature.Fourth, This thesis analyzes the tasks and functions of the personal namedisambiguation system. The names disambiguation module based on the knowledgeresources was designed and implemented. By complete the other modules of pagecrawling, page analysis and data storage, a name disambiguation system isimplemented to eliminate these ambiguities in search results of news.
Keywords/Search Tags:personal name disambiguation, organization name recognition, attribute extraction, adaptive resonance theory
PDF Full Text Request
Related items