Font Size: a A A

Research On Distributed Storage And Retrieval Technology Of Large-scale Knowledge Graph

Posted on:2020-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:C PengFull Text:PDF
GTID:2428330590983211Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Distributed storage is a way to cope with the rapid growth of data collection.With the rapid expansion of the data scale of knowledge graph,it is bound to face the storage problem of data collection.The current distributed algorithm of segmentation of non-relational data sets can lead to problems such as storage load and uneven distribution of relationship node density.On the basis of the distributed relational data set,combined with the characteristics of the knowledge map data structure,this paper uses the method of lexical semantic similarity and node average degree to segment the data set,and considers the load balancing and the redundancy of cross-server node relationship.The sub-graph retrieval combines the characteristics of this data structure segmentation,and uses the node degree decrement pruning method to disassemble the query sub-graph into multiple sub-trees with height 2,and compare with the sub-trees.The system structure design is divided into two parts.In the distributed storage part,the data set is partitioned according to the partitioning method proposed in this paper.In the distributed storage and sub-graph retrieval part.The main steps are as follows: the sub-graph to be processed first is divided into sub-trees including only the root node and the leaf node height 2 according to the characteristics of the data set distribution,and then the root nodes of all the sub-trees are queried to obtain the tree that include all direct relationships,then compared with the sub-tree of the query,and finally the result of the query is obtained.This system dataset uses simulated data as experimental data.The sub-graph retrieval experiment results show that under the data set segmentation method adopted in this paper,the distributed storage sub-graph retrieval takes less time than the commonly used hash distributed storage method;at the same time,in the case of node relationship redundancy,combined with the characteristics of the graph query,the query time in redundancy is less than the non-redundancy case.The redundancy of the hash distributed method is larger than the data set segmentation method adopted in this paper.More amount of redundant data and storage space are needed.It also takes more time to consume when the graph is queried.
Keywords/Search Tags:Knowledge Graph, Distributed, Sub-graph Matching, Semantic Similarity, Pruning Rule
PDF Full Text Request
Related items