Font Size: a A A

A Distributed B+ Tree For Big Data Stream

Posted on:2020-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:C X LuFull Text:PDF
GTID:2428330599476466Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming of the era of big data,the generation and application of data have become more diversified.As a special form of big data,data stream is characterized by timeliness,infinity and suddenness,etc.With high value,it's used in many fields widely.However,result from the high speed and large amount of data stream,there are great challenges of real-time processing,storage and query.In this regard,a distributed index structure is presented,which can support efficient storage and query of data stream.Contributions are as follows:1.Proposed a distributed B+ tree index structure: WB-Index,which can be applied to data stream.WB-Index adopts master-slave index structure,splits data stream with the use of time window.In each time window,according to the content of tuple,B+ tree index is built as the bottom index.For each continuous time window,the timestamp of the window and the corresponding bottom index meta compose <key,value> to build top index.WB-Index distributes the bottom index to multiple nodes to ease index maintenance.In the architecture of WB-Index system,tuple storage,index construction and query are separated by multiple nodes,so as to satisfy the efficient storage and query of data stream.2.An efficient index construction method is proposed for WB-Index.The efficiency of index construction is of great importance.For the bottom index,a batch construction method is proposed to accelerate the bottom index construction.For the top index,a non-splitting B+ tree updating method of pre-allocated node space is proposed to ensure updating efficiency and improve space utilization.3.An efficient persistence method is proposed for WB-Index.Because of the infiniteness of data stream,this paper uses the distributed file system to store the data stream and distributed index structure,and has designed a compact storage format to reduce the storage overhead.In order to improve the query efficiency after index persistence,secondary index structure is added in the bottom index to filter unnecessary queries,and new data and hot data are cached to further improve the query efficiency.WB-Index can support efficient storage and query of data stream,the validity of the WB-Index is proved by calculation and experiment.
Keywords/Search Tags:data stream, B+ tree, distributed index, distributed storage
PDF Full Text Request
Related items