Font Size: a A A

RDMA-based Distributed Database Memory Storage System

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:G LiangFull Text:PDF
GTID:2428330620964191Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of economy and culture,data analysis and processing technology is of increasing significance.However,the conventional databases nowadays cannot cope with the extremely large amount of data in terms of analysis speed and scalability.Due to the reduction of memory prices and the improvement of distributed system theory,distributed memory database has emerged.Although it has made great progress in mass data processing by overcoming conventional database's defect in the disk IO performance,there are still three problems in the systems.Firstly,the network overhead of data from the computing node to the data storage node in the throughput process is too large.Secondly,during the data query preprocessing of the storage system,the CPU calculation is not high in parallel and the calculation speed is slow,which can hardly achieve a high-speed filter of massive data.Thirdly,since distributed data storage nodes are prone to produce data heat,it is necessary to load balance the storage system in time.Based on the existing distributed database Goldfish in the laboratory,this thesis finds that the query efficiency of the overall database system can be improved by applying RDMA network and GPU acceleration technology to build a memory-based distributed storage system.Meanwhile,the system load balancing will be studied.This research will mainly focus on the following aspects:1)As for the bottleneck of distributed memory database network IO,an in-depth analysis on the existing TCP/IP and the Infiniband architecture will be made.Then a network communication framework based on the RDMA will be constructed.In order to solve the problem of low hardware coverage of RDMA network,a TCP-based network communication framework is also constructed to adapt to devices without RDMA network cards.2)in terms of low parallelism of the CPU in the calculation,query preprocessing on data has been accelerated by applying GPU-accelerated technology.Meanwhile,by using RDMA network technology,the overhead of one data copy from GPU memory to CPU memory is avoided.3)A distributed memory database storage system based on the RDMA network framework is designed.a Group Key Index data structure with high compression and fast query characteristics based on a dictionary compression algorithm is proposed to achieve the compressibility of columnar data.To improve the memory utilization and functionality of the database system,the ZSET data structure that supports insert operations is designed.4)As for the data heat occurring in distributed memory storage systems,a data migration algorithm of dynamic load balancing is proposed.The central metadata control node automatically completes the migration of hot data in high-load nodes to achieve a load balancing in data storage system.
Keywords/Search Tags:in-Memory Database, RDMA (Remote Direct Memory Access), Columnoriented Storage, GPU Accelerated Computing, Load Balancing
PDF Full Text Request
Related items