Font Size: a A A

Research On Summarization Methods For Large RDF Graphs

Posted on:2023-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:J M GuoFull Text:PDF
GTID:2530306800989159Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of Semantic Web,the Resource Description Framework(RDF)data has been widely used in knowledge modeling and data reuse in various fields and is growing explosively.The semantic graph formed by RDF data,called RDF graph(RDFG),contains from millions of linked data at the beginning to hundreds of millions.For example,the Linked Open Data(LOD)has more than 62 billion data so far.The large-scale and heterogeneous RDF graph in various fields makes it difficult for users to explore globally and understand.And each large RDF graph usually contains different data information,which also aggravates the user’s query difficulty.In order to avoid user information overload and present the required information in limited space,RDF graph summarization becomes an effective solution.An RDF graph summary is used to replace the original RDF graph,which contains concise and crucial data.RDF summarization is widely used in graph query,graph structure browsing,graph pattern discovery,graph inference.At present,RDFG summarization is a hot topic in the field of Semantic Web and knowledge graph.Although researchers have proposed various kinds of methods for calculating RDF graph summaries,efficient and effective RDF graph summarization method is still challenging due to the volume of RDF graph and its heterogeneity.How to extract key and representative data from massive data is still a problem that needs to be solved in RDFG summarization.Most of the summarization methods consider single strategies and the summaries retain only single aspects of the original RDF graphs,such as graph structure,node importance and so on,which cannot meet the needs of data reuse in various fields.To overcome the shortcomings of existing summarization methods,this thesis proposes summarizing RDF graphs from the aspect of user SPARQL query,node centrality,and node characteristics.Our contributions are as follows:(1)According to the history of RDF graph SPARQL queries,an RDF graph summary model based on user query and node importance is proposed,which takes into account its local characteristics and integrity.The model extracts user-interested RDFG data to meet user’s SPARQL personalized query requirements.(i)Based on the summary model,this thesis proposes two summary algorithms: Summary KG and Query Sum KB,and carries out experimental verification.The RDFG summary improves the query efficiency of users and meets their personalized query requirements,and can reflect the graph structure to some extent.(ii)We conduct experimental analysis on large-scale RDF graph including DBpedia,YAGO and Freebase,and the results show that the algorithms are effective in the aspect of summary time and query accuracy.(2)We propose a new RDFG summarization method,which is mainly based on node characteristics and centrality to summarize the RDF graph structure,and divides the node characteristics in the RDF graph into same-CS characteristics relation and same-type relation.(i)Based on these relations,this thesis proposes two summary algorithms Sum W which based on W relations and Sum S which based on S relations.(ii)In addition,we further calculate the node’s frequency and connection coefficient to obtain the centrality of nodes and property edges in the RDF graph.Based on the node centrality method,this thesis proposes the Summary FL algorithm based on the graph structure summary to retain the information of important nodes and property edges.(iii)Finally,the existing large-scale datasets: AGROVOC,DBpedia,Wikidata and Linkgeodata are used for experiments,and the algorithm proposed in this paper is compared with other summary algorithms.The experimental results show that the algorithm can retain the RDF structure and ensure the accuracy and effectiveness of the summary.To sum up,this thesis presents the RDF graph summarization method based on the aspects of user SPARQL query,node importance,node characteristics,and centrality.Experimental results show that the proposed algorithms can retain the structural characteristics of the RDF graphs and also help users to improve the efficiency of query and quickly explore the graphs.
Keywords/Search Tags:RDF graph, knowledge graph, RDF graph summarization, node characteristics, SPARQL query
PDF Full Text Request
Related items