Font Size: a A A

Research On RDF Data Partitioning And Index Methods

Posted on:2019-08-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L LengFull Text:PDF
GTID:1368330548484753Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The wide application of RDF(Resource Description Framework)data makes data volume grow rapidly.Distributed storage has become an effective method to solve large-scale data storage and management because of its high throughput,parallel processing,high scalability and high availability.One of the key technologies is the effective data partitioning.However,the uneven distribution of the vertex in RDF graph and the constantly updated changes make a serious challenge for RDF data partitioning.In addition,storage and index are the key to efficient data retrieval.The semantic and structure are two basic features of RDF data.How to combine the semantic and structure to store and index RDF data is another important challenge.Aiming at these challenges and existing problems,this dissertation studies the corresponding schemes,such as the balanced partitioning algorithm,incremental partitioning,path and star structure index.The main contributions are as follows:(1)The existing multi-level graph partitioning methods for RDF cannot effectively facilitate the power law distribution and the asymmetric relation between vertices.To tackle this problem,a static multi-level RDF graph partitioning method is proposed.The method improves the update and propagation mechanism of the label propagation algorithm,by using modularity to update the node labels and the energy attenuation to prevent the monster label.Finally,a balance adjustment mechanism is introduced to control the balance of graph partitioning on the coarsening graph.Furthermore,two optimization methods are proposed to improve the initial partitioning graph quality.Experiments validate that the proposed method can improve the coarsening speed and balance the coarsening scale of RDF graph.(2)The existing incremental partitioning methods do not consider both the dynamics of RDF data and the contraction-expansion of storage nodes.To address it,a dynamic incremental RDF graph partitioning method is proposed.The method designs an objective function based on edge cut and load balancing.With this function,the method first improves the k way greedy grow graph partitioning for the partitioning of existing data.Then,the adjustment scheme of the incremental data partitioning is given from three aspects:the augmentation and deletion of the triples,the overall balancing and the contraction-expansion of storage nodes.Experiments validate that the proposed method can handle incremental data partitioning effectively and maintain the overall performance.(3)The existing retrieval methods for complex SPARQL query may not be effective enough for its frequent connections.To address this problem,a depth oriented two level index and storage scheme based on path is proposed.In the filter level,the path template tree index uses the semantic association features commonly existing in chain structure to filter the candidate full path sets.In the matching level,the hierarchical edge index uses the known predicate to realized the backwards and forwards retrieval effectively.Meanwhile,the compressed storage based on k2-tree is implemented on each hierarchical index.Experiments prove that the proposed index and storage scheme can handle complex queries effectively,reduce the size of the intermediate result connection,and improve the utilization of storage space dramatically.(4)The existing distributed index methods based on star structure may not be effective for its partitioning efficiency and the frequent connection communication.To tackle it,a breadth oriented distributed RDF index scheme is proposed.First,the index scheme constructs the weighted graph based on the star structure obtained by the breadth first search method,and realizes the effective partitioning.Then,a compressed linked S-tree index is created for each storage node to achieve quick retrieval based on star structure.Furthermore,the compressed logical operation is designed to avoid the decoding operation.Experiments prove that the distributed index based on star structure can improve the parallel query effectively and the retrieval efficiency.
Keywords/Search Tags:RDF graph, SPARQL query, multi-level balanced graph partitioning, incremental partitioning, depth index, breadth index
PDF Full Text Request
Related items