In the trend of the digital economy,enterprises have increasingly massive real-time data as the amount of data generated in the real world explodes.Enterprises are more concerned about the value of data and are faced with the challenge of managing massive amounts of data and real-time data.Analytical database is the core data management system used by enterprises to gain insight into the value of data and guide decision-making,which is crucial for enterprise development and decision-making.At the same time,new hardware technologies continue to evolve,and the emergence of Non-Volatile Memory(NVM),a new type of storage hardware,provides new opportunities for optimizing analytical database.Since non-volatile memory is different from traditional block devices and memory,analytical database based on non-volatile memory need to be redesigned to take full advantage of the characteristics of non-volatile memory.This paper is concerned with analytical database optimization based on non-volatile memory.In the storage engine,the columnar data access and storage format in non-volatile memory is studied to optimize I/O performance.In addition,this paper studies the storage layout and strategy under the hybrid storage of non-volatile memory and solid-state drive to balance performance and cost.In the query execution,the hash join algorithm based on non-volatile memory is studied to optimize for the asymmetric read and write performance of non-volatile memory to improve the hash join performance.The details are as follows.(1)A columnar storage engine based on non-volatile memory optimization is designed.Since the I/O characteristics of non-volatile memory are vastly different from those of block devices,applying the traditional columnar storage engine directly to non-volatile memory cannot fully utilize its characteristics.By analyzing the I/O path bottleneck of traditional columnar storage engines on non-volatile memory,this paper optimizes the columnar data I/O path and columnar storage format on non-volatile memory to improve columnar data I/O performance.By analyzing the impact of incremental data writes on query performance,this paper design delta columnar storage to optimize query performance during incremental writes.(2)Hybrid storage based on non-volatile memory and solid-state drives is designed.The storage space and cost of non-volatile memory are still not advantageous compared with traditional block devices,so using non-volatile memory alone in analytical database is not optimal.This paper extends the non-volatile memory-based columnar storage engine to a hybrid storage system based on non-volatile memory and solidstate drives.The system layout in the hybrid storage system is redesigned,and the metadata management in the hybrid storage system is optimized for reducing the system restart recovery time.Finally,a hybrid storage strategy of non-volatile memory and SSD is designed for the access characteristics of data segments,taking into account the access frequency and the range of data involved in the query.(3)Non-volatile memory-based hash join algorithm is optimized.In this paper,the performance of two memory-based hash join algorithms(NPO and PRO)running on non-volatile memory is analyzed.And the results show that the memory-optimized hash join algorithm does not obtain optimal performance on non-volatile memory,and the performance bottleneck mainly comes from the writes on non-volatile memory.Therefore,the hash join algorithm based on non-volatile memory optimization needs to reduce or avoid the writes on non-volatile memory.This paper optimizes the write-dominated build phase of the NPO algorithm and the write-dominated partition phase of the PRO algorithm to improve the performance of the traditional hash join algorithm on non-volatile memory.This paper optimizes several key components in analytical database based on non-volatile memory,including columnar storage engine,cross-media hybrid storage,and hash join algorithms.The optimization method is designed around the characteristics of non-volatile memory,and several experiments are conducted in a real non-volatile memory hardware environment based on Click House,an open-source analytical database,to verify the rationality of the optimization method in this paper.The performance of the columnar storage engine based on non-volatile memory optimization is 3.68 times higher than that of the traditional columnar storage engine on non-volatile memory; after extending to hybrid storage,the query throughput can reach 74% of the throughput when using only nonvolatile memory when the storage space of non-volatile memory is 20%; the performance of the hash join algorithm based on non-volatile memory optimization is improved by up to 102% compared with the original method. |