Font Size: a A A

Research And Implementation Of Person Name Disambiguation

Posted on:2015-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:W HanFull Text:PDF
GTID:2298330467963539Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the computer technology and also the popularization and extensively used of the Internet, the amount of information on the web is also increasing rapidly. But with the advance of information technology, we suffer the problem of how to quickly get the information that we really need while the rapid growth of information allows us access to a wealth of information. It is very common that different individuals could share the same name, so person name usually has a strong ambiguity, when you try to get to know some information of a specific character it becomes more difficult. As a key technology to solve this problem, the name disambiguation technology has got more and more attention by domestic and foreign scholars. A series of evaluations has been held and pushed forward the development of related research fields.Based on studies of previous research, we go into the area of Chinese person name disambiguation problem. The main work of this paper includes the following parts:This paper proposes a term attribute model to represent a person. This model integrates the attribute of person and the keyword feature to denote a person.Moreover, this paper designs a series of person attribute and the corresponding method to extract them. This work enriches the basic attribute of a person and gets more attribute templates than before. When extracting person attribute, we uses the results of search engine to extend some of the attribute. Then, we design a method to calculate the similarity between two articles using keywords and person attributes and also the clustering method. This paper proposes a method to calculate semantic similarity between two words.Based on the above mentioned technology, we design and carry out a person name disambiguation system. This system is evaluated on both public evaluation corpus and web corpus. With the comparison to the other systems, the result shows us that the methods we proposed is effective. And the system has achieved a better performance than the existing best system in the two data sets.
Keywords/Search Tags:person name disambiguation, clustering, attributeextraction, person model
PDF Full Text Request
Related items