Font Size: a A A

Research On Key Technologies Of Memory Data Management And Analysis

Posted on:2021-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LiFull Text:PDF
GTID:1488306338479684Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,the application of data processing is growing rapidly.Data management technology,especially the relational database management systems(DataBase Manage-ment Systems),is widely used in various industries,such as data systems for spacecrafts,and daily shopping and consumption systems.With the rapid development of Internet technology,especially with the outbreak of the 5G market in the future and the continu-ous increase in the number of connected devices,the database management system will continue to become a current and future research hotspot.The performance of computer hardware has qualitatively improved in the past ten years.The most representative work is the large-scale popularization and application of memory data management technology.This article first describes the development of hardware and memory database technolo-gy,as well as several new technologies in the data processing industry:(1)Online trans-actional/analysisal hybrid processing,high-performance OLTP transactional systems are basically implemented using in-memory databases as standards.Analysis coexistence is a very common business scenario nowadays.On the basis of in-memory databas-es,exploring the system of transaction analysis fusion is also a current trend.(2)The relationship between databases and artificial intelligence.Databases can provide artifi-cial intelligence with a lot of experience in big data engineering and experience of fully squeezing hardware performance.Artificial intelligence can provide databases with many scenarios customization capabilities.Aiming at the new technology,this paper proposes several research algorithms for memory data management.(1)Learning based skip-list index technology.As a widely used indexing technology in the database,the number of nodes is generated by a random algorithm.This leads to unstable performance,this is because the classic skiplist structure does not combine the feature of data.This paper estimates the data cumulative distribution function based on the kernel density estimation method,predicts the position of the data in the skiplist,and then designs a skiplist algorithm for determining the number of nodes.In addition,we find that during the lookup of the skiplist,the nodes with a larger number of nodes have a higher probability of being accessed.Aiming at the access frequency of historical data,a "hot",data that guarantees frequent access is designed as far as possible in the upper layer of the skiplist,and the less-accessed"cold" data is in the lower-level.Finally,based on synthetic and real data,we perform a comprehensive experimental evaluation of the standard skiplist and five improved skiplist algorithms and open the source code.The experimental results show that the optimized skiplist can achieve a performance improvement of up to 60%.This points out a good direction for future researchers and system developers.(2)Asynchronous snapshot technologies for in-memory storage engine.We found that the academic community has proposed various snapshot algorithms to weigh throughput and latency performance,but in-memory databases like Redis insist on using simple fork functions to generate snapshots.To understand this phenomenon,we performed a comprehensive performance evaluation of mainstream snapshot algorithms.Surprisingly,we observe that simple fork algorithms do outperform most technologies in update-intensive workload scenarios.Extensive evaluation of mainstream algorithms shows that fork's performance produces better performance than the representative snapshot algorithms in academia,but is slightly worse than Hourglass and Piggyback.In addition,we propose a virtual snapshot technology for a wider range of transaction processing scenarios.Finally,we have released the implementation code of all the above-mentioned snapshot algorithms so that practitioners can benchmark the performance of each algorithm and choose the ap-propriate method for different application scenarios.(3)Storage engine for mixed workloads.This paper proposes a wait-free HTAP(WHTAP)architecture,which can efficiently perform O'LTP and OLAP requests in a wait-free manner.We developed and evaluated a prototype WHTAP system.Experiments show that the system can obtain similar OLTP performance as the Tic-Toc system,and at the same time,it can achieve 4-6 times acceleration in analysis and processing.(4)Performance evaluation of extreme learning machines for different computing chips.As an important machine learning algorithm,Extreme Learning Machine(ELM)is known for its excellent learning speed.Although using hardware accel-eration is an obvious solution,how to choose the right acceleration hardware for ELM-based applications is a topic worthy of further discussion.We have designed and evaluated optimized ELM algorithms on three of the most advanced accelera-tion hardware(ie multi-core CPU,graphics processing unit(GPU),and field pro-grammable gate array(FPGA)),based on the experimental results we recommended that(1)the GPU to accelerate ELM algorithms for large data sets;(2)using FPGAs for small data sets because of their lower power consumption,especially for some embedded applications.We have also opened the source code.
Keywords/Search Tags:modern hardware, in memory databases, hybrid transactional/analyti-cal processing, artificial intelligence, snapshot, skiplist, extreme learning machine
PDF Full Text Request
Related items