A Study On Person Name Discrimination Algorithm Based On Two-Stage Clustering

Posted on:2013-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:L W Zhang

Full Text:PDF

GTID:2298330467478187

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the growing popular of Internet, submitting to the search engine queries for information retrieval has become the primary method for people to obtain network information. Person name retrieval is one of the most common search task, through search engine one can easily obtain the information of a character, but due to the duplication of names is quite common, for a name search, search engines often return a long list of results contains many of the same name. To find specific characters, people need to add features to improve the query, or to browse the search results list for finding the figures of people which you want to query from a number of duplicate names. This would lead to search perormance declining significantly. Therefore, it is necessary to.study a person’s name disambiguation algorithm to improve the efficiency of names retrieval.In this paper, on the basis of analysis of existing names disambiguation theory and technology, we proposed a two-stage clustering method for the disambiguation of names. Person attributes is important to the discrimination of names, First, we extract16kinds of main character attributes, for nine kinds which relatively easy to extract, we using the traditional regular expression and dictionary matching method, but for seven kinds of attributes which are difficult to extract, we using an automated extraction method based on self-expansion. And then for the results returned by the search engine,we express them by the attribute vector, by calculating the similarity between documents, complete the initial clustering. Because not all pages include attribute information, so after the initial clustering, pages which does not contain the character attribute information can not be correct clustering. Therefore, this paper proposes a second-stage clustering method based on semantic relations. First, we extract semantic relations betweenthe concept of Wikipedia, and calculate the semantic relations to construct the semantic relationship graph; Secondly, using SimRank algorithm to calculate the similarity between any two nodes; then the initial clustering results expressed by the Wikipedia concept vectors; Finally, according to the concept semantic relations, the clusters similarity are computed, and the second stage clustering is completely.Experimental results show that our proposed two-phase clustering combination of the person’s name disambiguation algorithms have significantly improve precision and recall rates, and obtain better performance than previous methods. This prove that the proposed algorithm is effective on the person’s name disambiguation problem.

Keywords/Search Tags:

person name disambiguation, attribute extraction, semantic relationship graph, clustering

PDF Full Text Request

Related items

1	Esearch On Chinese Person Disambiguation Based On Sentential Semantic Structure And Personal Implicit Relationship
2	Person Name Disambiguation Based On Hierarchical Clustering And Web Page Relationship
3	Research On Crucial Technologies Of Web Person Name Entity Disambiguation
4	Research On Cluster-based Person Name Disambiguation
5	Research And Application Of News Event Clustering Algorithm Based On Semantic Relationship Graph
6	Research And Implementation Of Person Name Disambiguation
7	Research On The Application Of Person Name Disambiguation Based On Improved Agglomerative Hierarchical Clustering
8	Research On Chinese Person Name Disambiguation Algorithm
9	The Research On Personal Name Disambiguation And Character Relationship Extraction Merging Sentential Semantic Feature
10	Text Clustering Method In Topic Detection And Person Name Disambiguation