Font Size: a A A

Near-Data Processing-based Compaction Optimization In LSM-tree-based Key-Value Stores

Posted on:2020-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:2428330575454503Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Relational databases are much difficult to achieve high performance and low latency for big-data applications.LSM-tree-based key-value stores,which are are widely used in big data centers,are characterized by high throughput and scalability comparing with the traditional relational databases.However,under random update-intensive workloads,LSM-tree based key-value stores frequently conduct compaction operations;thereby resulting in high write amplification and low throughput in the system.Existing compaction optimization methods employ the host-side CPU-centric schemes to minimize write amplification and improve system performance.However,these methods are mostly dependent on the CPU and I/O resources on the host side,and the utilization of the whole system resources is low.In this work,two optimization schemes(Co-KV and DStore),which are on the basic of near-data computing model in the storage system,are proposed for compaction improvement.These schemes can perform the compaction tasks in parallel between the host and the device,which can relieve the CPU overhead on the host and reduce data transfer between two sides.The method improves the performance of the LSM-tree-based storage system.First,a static optimization strategy for compaction in LSM-tree named Co-KV is proposed in this study.Co-KV is a near-data processing framework-based improvement scheme for compaction.The host-and device-side compaction operations can be collaboratively performed.When a compaction procedure is triggered,the host-side subsystem calculates SSTables involved in the compaction and partitions them into two sub-tasks according to the predefind static partitioning ratio.The host-side system offloads these subtasks to the host side and the near-data processing-enabled device using a semantic management system.The compaction managers on both sides execute their compaction tasks in parallel.Co-KV leverages the computing resources in both subsystems to reduce CPU utilization and I/O overload,which improves system performance in compaction.The OED platform is employed to validate the feasibility of Co-KV.Under db_bench,the throughput of Leve1DB is about 3.5 MB/s,and Co-KV improves this metric up to 5.9 MB/s.Although Co-KV improves the performance of LevelDB,it cannot dynamically adapt to the various types of workload and platforms.This means that the static strategy may increase the latency of compaction tasks to impact on the system performance.Then,DStore is proposed to address this issue in Co-KV and it dynamically offloads compaction tasks according to demands on both sides.DStore calculates the offloading ratio of compaction tasks for the two-side systems according to the computing capability of compaction on each side.The ratio with the compaction task is sent to the device.The device-side module partitions the compaction tasks into many sub-tasks according to the offloading ratio and creates a double-ended queue for compaction tasks.Then,two threads are created to schedule these sub-tasks on demand from the double-ended queue.The both-side task threads actively trigger compaction subtasks from the queue according to its computing capability.DStore not only improves the weakness of a static offloading and tasks partitioning in Co-KV,but also shortens the delay of compaction on each side.This method can avoid performance degradation of compaction on each side.Results from the underlying platform show that DStore achieves a write throughput improvement by 2.3x and 5x,compared to Co-KV and LevelDB,respectively.The other platform with(1)various ratios of internal and external bandwidth,(2)different computing capability,and(3)OED platform are deployed by DStore,Co-KV,and LevelDB for scalability experiments.Experimental results illustrate that DStore takes obvious performance advantages over the other ones.
Keywords/Search Tags:LSM-Tree Key-Value Store, Near-Data Processing, Compaction Task Offloading and Partioning, On-demand Scheduling
PDF Full Text Request
Related items