Font Size: a A A

Research And Implementation Of Multi-source Heterogeneous Information Disambiguation System For Scientific Researchers

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X ChiFull Text:PDF
GTID:2348330542998750Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the forefront topics in the field of information technology,researchers' information mining constantly attract researchers to participate.Like other information on the Internet,researchers,information is widely distributed in all corners of the Internet.These information with problems like wide range of sources,diverse structures,and complicated content posed a great challenge to researchers in data analysis.Disambiguation of researchers' information effectively is an urgent problem to be solved.The scientific researchers' information disambiguation is same as name disambiguation.This paper try to adopt a two-step disambiguation method based on the combination of feature attributes and social relation networks.Disambiguation in this paper includes the disambiguation of papers and patents of researchers,as well as fusion of information about their professional social networks.The main contents and work of this research include the following aspects:(1)Data collection and preprocess.There are several different data collection methods proposed for different data sources in this paper and structured,unstructured and semi-structured data are preprocessed.And the design and implementation of automation crawlers are the focus of this part.(2)Construction of the researchers' ontology model.By extracting features of all kinds of data sources and using these features to build ontology models that uniquely identify a researcher,the heterogeneous data collected can be stored uniformly to facilitate disambiguation and analysis of researchers.(3)Determine the disambiguation solution.Studied the traditional disambiguation methods based on feature attributes and social relation networks.And proposed a two-step clustering disambiguation strategy combining these two methods.The results of disambiguation is constrained by time nodes and geographic location attributes.(4)Design and implement the system.Data collection,ontology construction and disambiguation methods are integrated into the system in modules to achieve effective integration and accurate searching of information of researchers.By a contrastive experimental analysis to feature clustering,social network clustering and two-step clustering based on the system,the results show that the effect of step-by-step clustering is better than the other two clustering methods.
Keywords/Search Tags:two-step clustering, disambiguation system, ontology model, automation crawler
PDF Full Text Request
Related items