Font Size: a A A

A Study On Biomedical Linked Data Cleansing And Integration

Posted on:2018-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:H L QiuFull Text:PDF
GTID:2348330512998172Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Rapid development of Semantic Web technologies offer a promising mechanism for the representation and integration of massive data.On the other hand,biomedical data cleansing and integration become a growing demand because of extremely large data and numerous subdomains in biomedicine field.Much effort has been devoted to using Semantic Web standards and technologies to create a network of biomedical linked data.For example,many of datasets provide cross-references to other data sources,but they are generally incomplete and error-prone.There are plenty of datasets available on Linked Open Data cloud,but these data are accessbile only through SPARQL queries,which is unfriendly for amateur users.Besides,different datasets using divided ontologies makes it difficult to aggregate query results.In this paper,biomedical linked data is analyzed and processed using data cleans-ing and integration methods.Data cleansing technologies analyze and verify data to eliminate data duplication,error and missing.For data integration,ontology matching and entity linking are two main problems.The main contributions of this paper are listed as follows:1.An empirical link analysis is conducted on datasets obtained from the Bio2RDF.Three different link graphs for datasets,entities and terms are characterized.The findings can be used in cleansing datasets and other applications.Also a benchmark is built to evaluate entity matching approaches.2.Cleansing datasets by completing missing values,correct error data and eliminate duplicate data.Also missing link are completed and error link are corrected by analyzing symmetry and transitivity of entity links.3.Cleaned datasets are integrated in an entity search and browsing system called BioSearch which provides a simplified query language and allows users to effi-ciently obtain their interested biomedical linked data.User evaluation demonstrated that BioSearch is more effective and usable than two state of art search and browsing solutions.
Keywords/Search Tags:Semantic Web, Linked Data, Data Cleansing, Data Integration
PDF Full Text Request
Related items