| In the information age,there is a growing demand for the storage and real-time analysis of massive data.Enterprise users turn to storing data on cloud object storage when building their online analytical processing(OLAP)database,due to its advantages of low cost,large capacity,and pay-as-needed.The performance of such external storage system differs greatly from that of the local storage.Reading data from the remote storage is quite costly.The I/O operation of the storage layer often limits the query performance of this kind of database.Cache is a crucial method to reduce I/O operation time for reading data.Caching data from cloud object storage on local memory and disk can improve the performace of OLAP databases.OLAP databases are column-store systems.The data layout for storage and the OLAP workload are quite different from that of row-store databases.OLAP workloads are range queries.Caching by page or by row,which is common in row-store databases,cannot provide effective performance improvements for range queries.Therefore,by analyzing the characteristics of column storage and OLAP database workload,a desgin of two-level hybrid cache of local memory and disk is proposed in this paper,including a cache replacement algorithm based on cache value score and a cache partition algorithm based on column group.The main contributions of this paper are as follows:(1)Analysis of caching strategy for OLAP workload:Firstly,this paper analyzes the common data layout of column store and the workload of OLAP databases.Then this paper analyzes cache design factors for OLAP workload,including cache location,cache granularity and the design of multi-level caching.This provides the basis for the design of the hybrid cache architecture.Finally,in order to explore the influencing factors of cache replacement strategy,experiments are designed to verify the effect of cache item size and the proportion of cache data in the query on the query time.(2)Design and implementation of hybrid cache of memory and disk:This paper proposes an architecture of hybrid cache of local memory and disk,including the description and implementation of each module.Memory cache and disk cache can support cache replacement based on any sorting algorithm.Compared to memory cache,disk cache is specifically optimized to solve the problem of excessive cache items.Then this paper proposes cache replacement algorithm based on cache value score.The cache replacement algorithm is carried out according to the ranking of cache value score.The calculation formula of cache value score takes the factors of recency,frequency,size and historical access interval of a cache item into account.(3)Algorithm design and implementation for dynamically partitioning cache space based on column group:In this paper,a cache partitioning algorithm based on column group.The columns that frequently appear together are divided into the same column group.Column data in the same column group share the same cache space.This cache partitioning method protects hot data from being evicted.This paper also proposes an adaptive dynamic partitioning algorithm based on column groups.The algorithm divides the cache space by column groups,and the cache space size can be dynamically adjusted with the workload.This paper implements the above solutions in Click House,which is an open-source columnstore database.A large number of experiments are carried out to compare the proposed schemes with LRU algorithm from three dimensions: cache hit ratio,cache hit size ratio and query time.It is proved that the proposed cache replacement algorithm based on cache value score,the dynamic partitioning algorithm based on column group and the hybrid strategy combining these two algorithms can provide higher cache hit ratio and better query performance. |