Font Size: a A A

Query Processing Techniques For Large-scale Product Knowledge Graphs

Posted on:2024-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:C X FangFull Text:PDF
GTID:2568307157482274Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the flourishing development of the Internet and the diversified growth of people’s daily life needs,the data generated by online shopping has become an enormous number that is difficult to count.Compared with general knowledge data,product knowledge data has characteristics such as heterogeneity,massive scale,and uneven data distribution.As the scale of the product knowledge graph continues to increase,users’ demands for the response speed of knowledge queries are also increasing.However,existing RDF(Resource Description Framework)knowledge query systems often do not fully consider the structural characteristics of the product knowledge graph,resulting in ineffective optimization of product knowledge retrieval performance.Additionally,large-scale product knowledge query services require real-time and accurate results,as product knowledge needs to be constantly supplemented and updated to meet different types of knowledge query requests.Therefore,high-performance query processing requires excellent scalability to ensure real-time and accurate query processing even after dynamic data updates.This study focuses on the characteristics of product knowledge data and researches the query processing of large-scale product knowledge.The main work includes:(1)Optimizing the storage of data indexing and proposing an RDF knowledge storage and query processing method based on predicate indexing.Based on the structural characteristics of product knowledge data,a data model based on predicate indexing is designed to convert RDF triples into entity pairs of predicate indexing for compressed storage of knowledge data and improved construction and loading speed of data indexing.A query optimization algorithm based on query type selection is designed to ensure that the overall performance of the query remains efficient.The experimental results show that this solution maintains competitive retrieval performance with mainstream RDF query systems while occupying smaller disk space and requiring less time to construct data indexing.(2)Optimizing query efficiency by proposing an RDF knowledge storage and query processing method based on compressed coding tree indexing.The bottleneck of binary join strategy in graph query lies in the redundancy of intermediate results,which leads to the decline of overall query performance.Therefore,based on the idea of the best-case optimal join algorithm,the query execution strategy is redesigned to reduce data redundancy during the query process.Furthermore,to improve the scalability of data indexing,a compressed coding-based index structure is designed to compress and store knowledge triples using numeric encoding and use the ordered structure of B+ trees to improve data indexing scalability.The experimental results show that this solution has good performance in index construction speed and disk space occupancy and has its advantages in knowledge data retrieval performance.
Keywords/Search Tags:RDF data, SPARQL query, predicate index, compressed encoded tree, query processing
PDF Full Text Request
Related items