Research On Accelerating Technique For Key-value Data Storage

Posted on:2013-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:H Hu

Full Text:PDF

GTID:2268330422473752

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Mass data exploring, business intelligence, applications of Web2.0etc. have posedgreat challenge to the database which is deployed in these systems. Typically, Databases deployed in these new application scenarios should be robust enough, have muchbetter read and write performances, be able to maintain tremendous mass of data, easyto scale whose cost should be low. However, Traditional Relational DatabaseManagement Systems (RDBMS) is not able to perform read and write operations asmuch as10thousands of times per second, moreover, RDBMS has poor efficiency inmaintain PB data, so can’t be used in mass data exploring in PB data. What is more,traditional RDBMS has poor scalability, the cost of horizontal scaling for RDBMS is sounacceptable that the underlying system would be closed temporarily and mass of datawould migrate when RDBMS scales.NoSQL data bases, Key-Value data bases especially, perform excellent in Massdata, high concurrency in visiting, high speed read and write, scalability. First,Key-Value data bases has tremendous read and write speed and thereby can give goodsupport to real-time applications. Meanwhile, comparing with RDBMS, Key-Valuedatabases have good scalability which can scale neatly with low time and space cost.In this paper, we make a research of accelerating technique for Key-Value storage.Firstly, we propose StageDB, a high performance Key-Value data base frameworkwhich is designed to handle Key-Value request in a pipeline way. StageDB has C/Sframework, the server of StageDB handles Key-Value requests in the following pipelinestage: asynchronous event handling stage, Key-Value request package parsing andchecking stage, access controlling stage, data base CRUD stage. In each handling stage,there is a concurrent queue which is responsible for buffer Key-Value requests, and aworker thread pool which is responsible for handling Key-Value requests in theconcurrent queue asynchronously and concurrently.As mentioned in the above paragraph, Key-Value requests are handled in bymulti-thread concurrently and asynchronously. A problem wrapped here is, if severalKey-Value request, which belong to the same user process and thus have read and writecorrelation mutually, are handled by multi-thread, the originally order of these requestswould possibly be violated. In this paper, Acceleration Algorithm of Handling RequestParallelly Based on Analysis of Read and Write Correlation (ARWC) is proposed. Withthe implementation of ARWC in StageDB, all the Key-Value requests are handled inextensively concurrently way and the correlation of read and write requests is kept at thesame time. In the subsequent experiment, StageDB outperform Kyoto Cabinet in therandom write experiment and gain a maximum of6X speedup over Kyoto Cabinet.In the scenario of Multi user visiting, Key-Value handling in StageDB becomes a bottleneck. On the other hand, multi-user visiting is a kind of data-thick applicationtypically, which has relatively simple logic when handling requests and is apt to beaccelerated on GPU. GPU has tremendous capability of concurrent processing, giganticmemory bandwidth and can accelerate the data-thick application at lowcomputing-power cost. In this paper, we conduct a research on the retrieve algorithm ofSkiplist on GPU. A data structure named GSkiplist and its correlating retrieve algorithmare introduced in this paper. Through normalizing the Skiplist nodes which is defined asrandom height in the classical definition, the GPU oriented Skiplist: GSkiplist is moresuitable to run on GPUs. Moreover, the retrieve algorithm of GSkiplist map the forwardnode comparison of GSkiplist node to different GPU threads and handle multipleKey-Value requests concurrently, therefore the efficiency of GSkiplist retrievealgorithm is optimized. In the subsequent experiment, in scenario of multi-user visiting,the GPU oriented GSkiplist retrieve algorithm outperform the counterpart of CPU andgain a maximum of5.5X speedup over the retrieve algorithm of CPU.

Keywords/Search Tags:

Key-Value data base, Pipeline handling, Concurrently handlingalgorithm, GPU accelerating algorithm

PDF Full Text Request

Related items

1	An Analysis of Successive and Concurrently Available Stimuli in a Tactile Paired Associate Sorting Tas
2	Research On Accelerating Convolutional Neural Networks Via Eliminating Weight And Feature Redundancy
3	Accelerating similarly structured data
4	The Design And Implementation Of Account Handling System Based On Java EE Architecture
5	Research And Implementation On Data Analysis System Of Gas Pipeline
6	The Research And Implementation Of The Knowledge Base Based On Media Wiki
7	Research On An Exception Handling Model In Workflow
8	The Optimization Study Of Equipment Configuration On Handling Technology In Ore Teminal
9	Pipeline Research Of ASIP Base On Petri Net
10	An evaluation of a pipeline planner for a filter based algorithm