Font Size: a A A

Research On Methods For Summarizing RDF Datasets

Posted on:2018-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:C JinFull Text:PDF
GTID:2348330512498172Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the foundation of Semantic Web,Resource Description Framework(RDF)is a data model advocated by World Wide Web Consortium(W3C),which has been used to describe resources and the relationships between each other.With the development of Semantic Web and the emergence of more and more open data portals,a large num-ber of entity-centric structured RDF data is published to the World Wide Web(WWW)for sharing and reuse.Generally speaking,a RDF dataset has large size and may in-volve multiple topics.Users,in a short time,have difficulty in determining whether the dataset meets the demand only with the help of the metadata such as author,release date,etc.Therefore,it has become a challenge to help users quickly inspect the contents and assess the effectiveness of a dataset.In fact,the summary can provide a quick probe of the contents of the dataset due to the general or representative description of the contents of the dataset.In this pa-per,we propose abstractive summary and extractive summary generation methods for a RDF dataset,to serve users a quick inspection of the contents and help users assess the effectiveness of a RDF dataset in a short time.The work of this paper has two main contributions:1.An abstractive summary generation method for a RDF dataset is proposed,which takes into coverage of a dataset,cohesion within groups,overlap between groups,and homogeneity of groups and height of hierarchy consideration.All of them are formulated into a combinatorial optimization problem,and we present an effi-cient solution to the problem.At last,a user-control hierarchical summary is gener-ated with the computational resources available.The experiment on the real world datasets proves the effectiveness of the method.In addition,we also achieve a pro-totype system based on this algorithm.2.An extractive summary generation method for a RDF dataset is proposed,too,which takes into coverage of the entity type and property in a triple,familiarity of entities in a dataset and cohesion of the summary snippet consideration.All of them are formulated into a new combinatorial optimization problem called maximum-weight-and-coverage connected graph problem(MwcCG).A constant-factor ap-proximation algorithm is proposed to the problem.Finally,the experiment includ-ing a quantitative analysis and a user study of the summary snippets generated for real-world datasets by our approach and a baseline approach validates the effective-ness of the summary.
Keywords/Search Tags:RDF, hierarchical, abstractive, extractive, dataset summary
PDF Full Text Request
Related items