Font Size: a A A

Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database

Posted on:2018-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhengFull Text:PDF
GTID:2348330512983052Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an important information storage and searching tool,the database's design always changes with demand.Its system architecture develops from standalone into distributed,storage media evolves from hard disk into memory,and data organization becomes from row-oriented to column-oriented.As a result,cache technology which is one of the key optimization methods for database develops all the time.Database cache usually analyzes and combines system characteristics,then stores historical searching results to avoid handling repetitive operations for goal to improve the query speed.Current cache technologies can be divided into following three categories from cache object level,namely,page cache,tuple cache and semantics cache.The first two meet requirements of traditional databases.With high abstraction,semantics cache has extensive applicability,but reduces its optimization ability for specific scenes.In the scene of On-Line Analytical Processing(OLAP)with low concurrency and high data throughput,the problem to design a cache method for in-memory distributed columnar database is a hot research topic.According to the columnar data organization and asynchronous computation model of Goldfish,a distributed columnar database developed by ourself,the thesis presents a Distributed Physical Planning Semantic Cache(DPPSCache)which stores intermediate data to avoid repeated computation and reduce network data transferred,to improve query speed of OLAP.The thesis introduces the cache organization method,cache matching algorithm,cost model and replacement algorithm,cache reliability.The main work of this thesis is as follows:1.DPPSCache caches intermediate results of physical operations in distributed physical planning,and constructs cache characteristic trees with indexes by operations' local and global semantic information.2.With the analysis of semantic matching and value range matching,the thesis presents a Cache Characteristic Tree Matching Algorithm(CCTM).3.Based on the cache features of distributed columnar database,the thesis presents a Reference and Cost Based Replacement Algorithm(RCBR)with cost model.4.For the reason that in-memory cache data may lose easily in distributed system,the thesis designs some cache backup strategies,such as multiple replications,erasure coding and persistence.The thesis designs and implements a distributed cache manager system which is based on DPPSCache and compares performance to two open source distributed databases,Hive and Spark SQL.The test report shows that Goldfish with cache not only improves original system's performance,but also is better than Hive and Spark SQL.On the other hand,RCBR algorithm is better than traditional replacement algorithms Least Recently Used(LRU)and Least Frequently Used(LFU).
Keywords/Search Tags:distributed cache, cache characteristic tree, cache matching, cache replacement, cache reliability
PDF Full Text Request
Related items