Font Size: a A A

Design And Implementation Of Bidirectional Mapping Data Structure For RDF Datasets

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZengFull Text:PDF
GTID:2518306503974149Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the times and the popularization of big data,the amount of data held by human is increasing.The development of data structure,the way of storing and organizing data by computers,is constantly keeping pace with the times.The bidirectional mapping data structure as a important data structure is widely used in those bidirectional mapping storing and retrieving application,such as data remapping optimization,DNS,etc.As a data model in new era,RDF is usually used to describe the web resources and the relationships between resources.It has been adopted by more and more public knowledge bases and web contents storing.For those applications on RDF dataset,like query engine Wukong[1-3]and computing framework Power Graph[4],they generally use data remap-ping optimization to optimize the storage and calculation,and data remap-ping optimization requires bidirectional mapping data structure to store the bidirectional mapping relationship between source data and remapped data.For those applications,the storage cost and query performance of the bidi-rectional mapping data structure become the main determinants of money cost and user experience.Therefore,it's necessary to study the bidirectional mapping data structure in those applications on RDF dataset.We proposed burst-trie-based bidirectional mapping data structure Bi-trie based on two observations that RDF dataset contains a large number of common string sequences and bidirectional mapping will store data repeat-edly in forward mapping and backward mapping.As for forward mapping storing in Bi-trie,we adopt burst trie to efficiently store the common string se-quences part in forward mapping key-value pairs.As for backward mapping storing in Bi-trie,we adopt unordered map to store the backward mapping key-value pairs by reusing the string data stored in forward mapping instead of storing the complete string in unordered map.To sum up,Bi-trie greatly reduces the bidirectional mapping storage cost on RDF dataset.In summary,our paper makes the following contributions:First,we analyzed the characteristics of RDF dataset and the problems in bidirectional mapping data structure.And we proposed a bidirectional mapping storing scheme based on burst trie,data reusing that greatly reduces the bidirectional mapping dataset storing.Secondly,we designed and implemented the bidirectional mapping data structure Bi-trie based on burst trie,and the storage utilization of Bi-trie is improved through multiple designs,such as the reuse of the forward mapping and the backward mapping storage,and the children representation of trie node,the hash collision resolution strategy of hash node in burst trie,the page management module,etc.Thirdly,we designed and implemented three optimizations on its insert and query process:As for the insertion,a large number of common string sequences in RDF dataset will cause redundant burst process during the insertion on burst trie;As for the forward mapping query,the character-by-character matching way on burst trie is less efficient;As for the backward mapping query,because the backward mapping storing reuses the data in burst trie in forward mapping storing,the process of getting the complete string through an burst trie is less efficient.Hence,we propose three opti-mizations to avoid those inefficiencies of Bi-trie,including redundant burst avoiding,fast path and prefix gathering.
Keywords/Search Tags:Bidirectional mapping, Burst trie, Hash table, RDF
PDF Full Text Request
Related items