Font Size: a A A

The Study Of Customizable Accuracy Scan Algorithm For Key-value Data

Posted on:2016-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhouFull Text:PDF
GTID:2348330479954337Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the age of big data, Internet storage workload have features such as high concurrency, big scale and more flexiblility. RDBMS(Relational Database Management System) which was widely used in tradition mode and characterized with fixed data normal form, can not adapt to unstructured data, and shows poor performance in data storage and access operation. Now, Key-Value storage system lead the main trend in processing unstructured data. But for data analyze operation of storage system, to scan total data in storage system is unnecessary, part Key-Value data that take less time and I/O operations can be used for data analyze, even though the data achieved may be inaccuracy.A new scan algorithm for LSM-tree(Log-Structured Merge tree) based Key-Value storage system is presented. Contributions are listed as follows.(1) Summarize LSM-tree based storage system's standard scan algorithm Full SCAN, and test it on different storage media. FullSCAN has disadvantage of I/O waste and longer time-consuming in the circumstance of customizable accuracy scan.(2) Illustrate a scan algorithm based on probability sampling, which is called SamplingSCAN, it meets the need of customizable accuracy scan by select scan target randomly. SamplingSCAN only gets limited achievement since it doesn't take storage system's feature into consider.(3) Propose a customizable accuracy scan algorithm named CustomizableSCAN. By using Bloom filter, it evaluates storage system's Key-Value quantity while avoid I/O operation for Key-Value data, and adjust the estimation error by specific rules, select scan target to get Key-Value data. CustomizableSCAN can reduce disk I/O and scan time.(4) Formulate test plans, do test and comparison in the circumstance of changing storage data feature and storage medium for three algorithms(FullSCAN, SamplingSCAN, CustomizableSCAN). Evaluating result shows that in the overwhelming majority of cases, CustomizableSCAN can reduce scan time while guarantee the scan result's accuracy, and has an outstanding performance.
Keywords/Search Tags:Key-Value storage, LSM-tree index, customizable accuracy scan
PDF Full Text Request
Related items