| The rapid development of Internet technology and big data has spawned a large number of business analysis applications.These applications have massive and complex computing requirements.How to quickly respond to such requirements by the underlying analytical query engine is a huge challenge.Both academia and industry have proposed different levels of caching schemes to optimize the execution of database system queries in order to achieve lower query latency.This paper mainly studies the design and implementation of operator cache for analysis scenarios.This paper also designs and implements a prototype of Filter operator cache for database Select operation based on analytical database.Filter operator cache uses the cache matching algorithm based on ”semantic tree”proposed in this paper.This paper further explores the possibility and optimization methods of higher-level operator caches,focusing on the general paradigm of operator cache design.Finally,the cost based optimizer is transformed to construct an adaptive Sort operator cache.The contributions of this paper can be summarized in the following points:1.Design how the operator cache works.After investigating query acceleration solutions including but not limited to materialized view solutions,query result caching solutions,concurrent query shared data solutions,middleware caching solutions,and Alluxio solutions.The existing cache acceleration scheme is roughly divided according to the dimension of ”the amount of computation contained in the cache”,so that the function of the operator cache can be obtained,that is,the execution result of a certain operator is cached in the external storage system,and the cached execution result is filtered and sorted during the subsequent execution process.Then return to the upstream operator.2.Design and implementation of cache matching algorithm.When facing the matching problem between caches,this paper abstracts it into the matching problem between trees,and uses lexical analysis technology to build a ”semantic tree” that can represent caches.Based on the constructed ”semantic tree”,the subtree matching algorithm proposed in this paper can be run to obtain whether there is an inclusive relationship between the cached result sets.The key point of implementing Filter operator cache is to perform cache matching,which ensures the correctness of the entire cache system.3.Design and implementation of cost optimizer based on the characteristics of high-level operator cache.Although the Sort operator cache and Filter operator cache are similar in principle and working method,the resource consumption of the Sort operator cache in the stages of cache matching and cache processing is much greater than that of the Filter operator cache.Therefore,it is necessary to consider a new mechanism for this high-level operator cache to handle this situation,preventing the use of cache from slowing down query performance.The solution given in this paper is to increase the operator cache cost calculation logic and modify the cost optimizer,so that the function of adaptive Sort operator cache can be realized.4.The Filter operator cache and Sort operator cache are implemented based on the open source analytical database Click House.This paper combines the understanding of Click House’s query execution mechanism and the working methods of Filter operator and Sort operator on issues such as caching mechanism,cache storage,cache processing,and cache matching,and verifies the correctness and high performance in the experimental verification stage..In general,this paper proposes the corresponding Filter and Sort operator cache mechanisms for the commonly used Select and Sort operations,and designs them from the aspects of cache method,cache matching algorithm,and cost based optimization.There are also relatively good overhead and performance in the tests of various scenarios. |