Research And Implementation Of Multi-source Heterogeneous Information Disambiguation System For Scientific Researchers

Posted on:2019-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:X Chi

Full Text:PDF

GTID:2348330542998750

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As one of the forefront topics in the field of information technology,researchers' information mining constantly attract researchers to participate.Like other information on the Internet,researchers,information is widely distributed in all corners of the Internet.These information with problems like wide range of sources,diverse structures,and complicated content posed a great challenge to researchers in data analysis.Disambiguation of researchers' information effectively is an urgent problem to be solved.The scientific researchers' information disambiguation is same as name disambiguation.This paper try to adopt a two-step disambiguation method based on the combination of feature attributes and social relation networks.Disambiguation in this paper includes the disambiguation of papers and patents of researchers,as well as fusion of information about their professional social networks.The main contents and work of this research include the following aspects:(1)Data collection and preprocess.There are several different data collection methods proposed for different data sources in this paper and structured,unstructured and semi-structured data are preprocessed.And the design and implementation of automation crawlers are the focus of this part.(2)Construction of the researchers' ontology model.By extracting features of all kinds of data sources and using these features to build ontology models that uniquely identify a researcher,the heterogeneous data collected can be stored uniformly to facilitate disambiguation and analysis of researchers.(3)Determine the disambiguation solution.Studied the traditional disambiguation methods based on feature attributes and social relation networks.And proposed a two-step clustering disambiguation strategy combining these two methods.The results of disambiguation is constrained by time nodes and geographic location attributes.(4)Design and implement the system.Data collection,ontology construction and disambiguation methods are integrated into the system in modules to achieve effective integration and accurate searching of information of researchers.By a contrastive experimental analysis to feature clustering,social network clustering and two-step clustering based on the system,the results show that the effect of step-by-step clustering is better than the other two clustering methods.

Keywords/Search Tags:

two-step clustering, disambiguation system, ontology model, automation crawler

PDF Full Text Request

Related items

1	Design And Implementation Of Author Name Disambiguation System Based On Two Step Clustering
2	Research Of Focused Crawler Based On Semantic Disambiguation Hidden Markov Model
3	Research And Implementation Of On Semi-automatic Ontology Construction Base On WordNet And Focused Crawler
4	Design And Implementation Based On Relational Database Of Ontology Semiautomatic Construction System
5	Research And Implementation Of Retrieval System Based On Domain Ontology
6	Research On Algorithms Of Real Estate-Ontology Topical Crawler
7	Design And Implementation Of An Ontology-based Multimedia Material Web Crawler
8	Author Name Disambiguation Based Rule And Graph Model
9	Research On Semantic Crawler Algorithm And System Realization Based On Ontology
10	System Design And Implementation Based On Crawler And Text Clustering For Network Public Opinion Analysis