Font Size: a A A

Design And Implementation Of RDMA-based Secondary Index In Distributed Systems

Posted on:2019-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:X XueFull Text:PDF
GTID:2428330590992455Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The coming of Internet Age speeds up the explosive data growth.In today's world,more enterprises realize the potential wealth that Big Data brings,as well as the tremendous influence to enterprise development.Accompanied by the potential value of Big Data is the stronger demand of computation ability over Big Data,which challenge database product,the key product in data processing market.To achieve better performance,existing databases are more designed as distributed systems.Supported by distributed secondary index,the performance of processing range query is an important part of database technologies and is the hot topic for academic and industry.The key to enhancing range query performance is to locate in-range data and access it efficiently.To achieve that,many researchers have made hard efforts in two research direction:adapting better secondary index and optimizing the distribution and layout of indexed data.However,there still exist some troubles hurting range query performance like data serialization cost and CPU interruption cost.This paper analyzes the limitation of current range query design and implementation thoroughly,and opportunities as well.By utilizing the backup mechanism in OLAP environment,we design and implement the technique of restructuring backup into indexed backup,and answering range query over it,with no compromise on efficiency of point query.The main contributions of this paper are as follows:· We analyze the limitation of current range query design and implementation thoroughly.From the perspective of secondary index,it is implemented as index tree widely to keep logical order of indexed value,often distributed to many nodes.When traversing remote secondary tree index,RPC will interrupt remote CPU frequently.Using RDMA can overcome it but will cause amplification of one-sided operations instead.After traversing tree index,range conditions are translated into multiple point search requests,which will be the bottleneck of the whole query.From the point of view of optimizating the data distribution,the serialization cost matters a lot as long as the in-range data is maintained in a dispersed style.Column store can provide ideal data locality,but with the cost of decline of processing point query.Redundant data replica can be added to enhance the range query processing,but brings non-negligible storage overhead.· We design and implement a brand-new range search technique based on our indexed backup.Replication-based backup mechanism provide durability and recoverability.As a matter of fact,backup data is same as primary data,and can be transformed into a rangefriendly replica without compromising point query on primary data.Our indexed backup is friendly for answering range query over RDMA,enhancing the overall effciency.· We conduct various experiments on indexed backup to examine its performance over TPCH,an OLAP benchmark.Evaluation shows that indexed backup provides 1.1X to11.2X speedup in calculation by eliminating serialization cost,and the latency drops to31.3% at most.Experiments also shows that our approach has good scalability.
Keywords/Search Tags:Distributed Database, Secondary Index, Range Query, Indexed Backup
PDF Full Text Request
Related items