Font Size: a A A

Research On Accelerating Technique For Key-value Data Storage

Posted on:2013-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:H HuFull Text:PDF
GTID:2268330422473752Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Mass data exploring, business intelligence, applications of Web2.0etc. have posedgreat challenge to the database which is deployed in these systems. Typically, Databases deployed in these new application scenarios should be robust enough, have muchbetter read and write performances, be able to maintain tremendous mass of data, easyto scale whose cost should be low. However, Traditional Relational DatabaseManagement Systems (RDBMS) is not able to perform read and write operations asmuch as10thousands of times per second, moreover, RDBMS has poor efficiency inmaintain PB data, so can’t be used in mass data exploring in PB data. What is more,traditional RDBMS has poor scalability, the cost of horizontal scaling for RDBMS is sounacceptable that the underlying system would be closed temporarily and mass of datawould migrate when RDBMS scales.NoSQL data bases, Key-Value data bases especially, perform excellent in Massdata, high concurrency in visiting, high speed read and write, scalability. First,Key-Value data bases has tremendous read and write speed and thereby can give goodsupport to real-time applications. Meanwhile, comparing with RDBMS, Key-Valuedatabases have good scalability which can scale neatly with low time and space cost.In this paper, we make a research of accelerating technique for Key-Value storage.Firstly, we propose StageDB, a high performance Key-Value data base frameworkwhich is designed to handle Key-Value request in a pipeline way. StageDB has C/Sframework, the server of StageDB handles Key-Value requests in the following pipelinestage: asynchronous event handling stage, Key-Value request package parsing andchecking stage, access controlling stage, data base CRUD stage. In each handling stage,there is a concurrent queue which is responsible for buffer Key-Value requests, and aworker thread pool which is responsible for handling Key-Value requests in theconcurrent queue asynchronously and concurrently.As mentioned in the above paragraph, Key-Value requests are handled in bymulti-thread concurrently and asynchronously. A problem wrapped here is, if severalKey-Value request, which belong to the same user process and thus have read and writecorrelation mutually, are handled by multi-thread, the originally order of these requestswould possibly be violated. In this paper, Acceleration Algorithm of Handling RequestParallelly Based on Analysis of Read and Write Correlation (ARWC) is proposed. Withthe implementation of ARWC in StageDB, all the Key-Value requests are handled inextensively concurrently way and the correlation of read and write requests is kept at thesame time. In the subsequent experiment, StageDB outperform Kyoto Cabinet in therandom write experiment and gain a maximum of6X speedup over Kyoto Cabinet.In the scenario of Multi user visiting, Key-Value handling in StageDB becomes a bottleneck. On the other hand, multi-user visiting is a kind of data-thick applicationtypically, which has relatively simple logic when handling requests and is apt to beaccelerated on GPU. GPU has tremendous capability of concurrent processing, giganticmemory bandwidth and can accelerate the data-thick application at lowcomputing-power cost. In this paper, we conduct a research on the retrieve algorithm ofSkiplist on GPU. A data structure named GSkiplist and its correlating retrieve algorithmare introduced in this paper. Through normalizing the Skiplist nodes which is defined asrandom height in the classical definition, the GPU oriented Skiplist: GSkiplist is moresuitable to run on GPUs. Moreover, the retrieve algorithm of GSkiplist map the forwardnode comparison of GSkiplist node to different GPU threads and handle multipleKey-Value requests concurrently, therefore the efficiency of GSkiplist retrievealgorithm is optimized. In the subsequent experiment, in scenario of multi-user visiting,the GPU oriented GSkiplist retrieve algorithm outperform the counterpart of CPU andgain a maximum of5.5X speedup over the retrieve algorithm of CPU.
Keywords/Search Tags:Key-Value data base, Pipeline handling, Concurrently handlingalgorithm, GPU accelerating algorithm
PDF Full Text Request
Related items