Research And Application Of Distributed RocksDB Based On Separation Of Storage And Computing

Posted on:2022-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:D Wu

Full Text:PDF

GTID:2518306773496514

Subject:FINANCE

Abstract/Summary:

PDF Full Text Request

At present,most open-source distributed databases adopt an architectural mode of coupled computing and storage,which has disadvantages such as unbalanced computing and storage,difficulty in updating the ratio of CPU and memory,and the expansion need to migrate a large amount of data.In the Internet era with the rapid growth of data concurrency and usage,the CPU and memory ratio and storage capacity of the database need to be continuously updated,and the defects of the coupling between computing and storage are becoming more and more obvious.With the reduction of network latency,the architectural pattern of separation of computing and storage has become a new development direction.Taking RocksDB as an example,this p AEPr studies the transformation method of a key-value database that separates computing and storage,and designs and implements a distributed RocksDB database.RocksDB adopts Write Append Log(WAL)and Compact memory mechanism,and there is no operation to update data files.Therefore,its data files(such as SST,WAL,MANIFEST and other files)are more suitable for using Append Only mode's High-Performance Distributed File System(HPDFS)as the underlying file persistent storage system.This p AEPr transforms the RocksDB database based on the architecture model of separation of computing and storage,and uses technologies such as heterogeneous storage,RDMA(Remote Direct Memory Access)and SPDK(Storage Performance Development Kit)to speed up the storage and get,and eliminate the impact of database-side remote storage performance.The thesis mainly completes the following tasks:(1)Transform RocksDB into a distributed database,adopt the mode of one master and multiple slaves,and register the selected master on the Zookeeper component.At the same time,a distributed file lock is designed on the HPDFS developed by the author's team,and the distributed file lock is used to confirm the uniqueness of the main server to ensure that RocksDB will not be double-written.The master service node provides the ability to read and write,and the slave service node provides the ability to write.After the primary service node fails,the other service node can be elected to provide services for the new primary service node to achieve high availability.(2)The interface of the underlying file storage of RocksDB has been transformed,and the remote distributed file system is accessed through the SDK provided by HPDFS,and features such as data disaster recovery,copy recovery,and redundant compression are integrated into the distributed file system.At the same time,the data of different RocksDB instances exist in different file directories of HPDFS to achieve data isolation.(3)Focused on the feature that the speed of writing WAL and MANIFEST files for RocksDB affects the overall writing performance of RocksDB.In the research,it is proposal to store RocksDB data on heterogeneous media.To meet the requires of performance and cost.The system can use two storage media,Persistent Memory(PM)and Solid State Disk(SSD)at the same time.Use PM to store WAL and MANIFEST files with less data,and use relatively cheap devices such as SSD to store SST files.In addition,this p AEPr proposes to use the PM cache on the computing node side to access frequently read index files with a relatively small amount of data for services with frequent reads and high read performance.(4)The write operation bottleneck of RocksDB mainly lies in the additional writing of the WAL log,and all write requests will be blocked to wait for the completion of the WAL log persistence.The native batch write for RocksDB has the defects of lock waiting and blocking.In this research,through the queue and asynchronous writing mode,the writing thread encapsulates the request into a Context and puts it in the Pending queue,and then the polling thread fetches the data from the queue in batches,assembles and sends it to the remote distributed file system,and improves the system's throughput capacity through the asynchronous mode.In view of the problem of network delay in remote storage,this study uses the data bypass of RDMA technology and the replication of user space and operating system kernel space's feature to reduce network delay,and uses SPDK technology to improve the read speed of SSD disks.Two technologies,RDMA and SPDK,can effectively help reduce the impact of latency.(5)Through experiments,the thesis compares the effects of batch writing,PM,RDMA?SPDK and other technologies before and after optimization.The reliability,scalability and load balance of the distributed RocksDB system are verified through experiments such as machine failure,disk failure,network failure,adding computing nodes,and adding storage nodes.Experiments show that the distributed RocksDB database designed and implemented in this study with separate computing and storage has strong reliability and scalability.The comprehensive IO optimization effect can make the distributed RocksDB storage performance close to the storage performance of RocksDB in the local SSD file system.At present,the distributed RocksDB storage system is applied to the metadata disaster recovery storage of the object storage system by trial operation,and the online instances are monitored through the Grafana monitoring tool,and the overall operation of the system is stable.

Keywords/Search Tags:

KV storage system, RocksDB, RDMA?SPDK, Separation of computing and storage, Optance

PDF Full Text Request

Related items

1	Design And Implementation Of A Distributed Storage System Based On RocksDB Engine
2	RDMA-based Distributed Database Memory Storage System
3	Research And Implementation Of Distributed Storage System For NVMe And RDMA
4	Research And Implementation Of Key-Value Database Based On Separation Of Computing And Storage
5	Configuration Optimization Of RocksDB Storage Engine Based On Machine Learning
6	Research On Shuffle Technology Of Separation Of Computing And Storage In Big Data System
7	Research On Large Scale Parallel Storage Systems For Super Computing
8	Design And Implementation Of Software Definition Storage Based On NVMeOF
9	Design And Implementation Of Distributed Key-Value Storage System Based On RDMA
10	Design And Implementation Of NVM-based Distributed Backend Storage