Font Size: a A A

Name Disambigusion In Scientific Cooperation Network

Posted on:2012-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q LinFull Text:PDF
GTID:2218330362956537Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are many scientists of same name among the scientist cooperation network. Renowned academic platforms such as Arnetminer, Springer, ACM, DBLP and CiteSeer have the problem that when they calculate the academic ability of the scientists, they regard several scientists of same name as one person. Because of the ambiguous names, there exist a lot of errors in the scientist cooperation network, which cause confusion to the research based on it. Thus, name disambiguation is meaningful.Existing name disambiguation algorithms mainly extracted features from co-authors information, authors'organization and citation relationship to input to the graph model. These algorithms have a common weakness of low recall and lacking the ability of continuous learning. Name disambiguation is translated into a problem of classification by estimating whether two papers are written by same author. By adopting features used by existing algorithms and analyzing the manual progress of name disambiguation, such features are extracted: Co-Author, Co-Org, Citation, Homepage, Title Similarity, PDF File, and Dig-Lib. Perceptron is used as the classifier and the feature Homepage is used as the constraint.User feedback is imported into algorithm to improve performance. According to the credibility of user, user feedback is classified into three types. Two new features are extracted from it as input to the perceptron and feedbacks proposed by high credible users are adopted as extra constraints. By constructed feedbacks as a training stream, perceptron can be enhanced continuously. Experiments show that after importing user feedback, the algorithm can learn continuously and get a better performance. This algorithm has been used to the Arnetminer.
Keywords/Search Tags:name disambiguation, academic network, features extraction, constraint, user feedback
PDF Full Text Request
Related items