Font Size: a A A

Optimization For Asynchronous Incremental View Maintenance Of Hybrid Workloads

Posted on:2021-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:H C DuanFull Text:PDF
GTID:1368330647455161Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
With the increase of data scale and user amount,the more diverse business requirements and more complex Hybrid Transaction / Analytical Processing(HTAP)workloads have put forward higher requirements on transaction throughput and analytical query latency for database systems.At present,there is no recognized optimal solution for HTAP workloads,but the transaction processing systems based on Log-Structured Merge-Tree(LSM-tree)have been proven to be efficient in transaction processing(such as Ocean Base of Alibaba),so building analytical query capabilities on scalable transaction processing systems to respond to HTAP workloads has gradually become a hot issue of research.The materialized view can effectively reduce the query latency through precomputing and caching the results,which is an important method for analyzing large-scale data.In order to keep the contents of the view consistent with the base table,the view update caused by the change of the base table will inevitably introduce additional overhead.In order to efficiently maintain views on scalable transaction processing systems,how to design the storage and maintenance process of views under this new architecture,how to minimize the additional costs introduced by transaction processing while ensuring the benefits of view queries,and how to optimize the system overhead have become core problems that need to be solved urgently.In response to these problems,this dissertation designs a series of optimization methods around the architecture of the scalable transaction processing systems,the transaction processing,and the IO resources,which fundamentally reduces the maintenance cost of the view.The main work and contributions of this dissertation are as follows:(1)Propose an incremental view maintenance strategy combined with the characteristics of the new distributed LSM-tree architecture: The rapid expansion of data scale has prompted the transformation from the traditional scale-up stand-alone database architecture to the new scale-out distributed database structure.The distributed log-structured merge-tree(LSM-tree)is a typical solution,which can provide scalable transaction processing capabilities.However,for this new type of architecture,there is still a lack of research on the materialized view maintenance.We summarize the architectural features of distributed LSM-tree and the design elements of view implementation,and propose an asynchronous maintenance method that decouples the view maintenance from the transaction processing,filling the gap in this field.On the basis of ensuring consistency,we propose specific view maintenance strategies for different workloads.For the performance of multi-table join views,we design and implement a view maintenance process based on multiple two-table joins,so that the records in the view that need to be modified according to the updates of the base table can be directly located,which effectively reduces the cost of view maintenance and its impact on transaction processing.(2)Optimize incremental view maintenance under high-throughput transaction workloads: Current view update methods usually take each individual operation or log as the input of calculation,so that the corresponding optimization can only focus on a single operation or several operations in the same base table.Under high-throughput transaction workloads,the impact of view maintenance on transaction processing is magnified,and the performance of view maintenance needs to be further optimized.From the perspective of the entire transaction that contains several operations,we combine multiple base table operations to generate the increments of the view,which greatly reduces the view maintenance overhead.In fact,in OLTP(On-line Transaction Processing)workloads,a transaction often contains the execution of multiple tables,which implies the correlation of data in each table,and this is also the concern of OLAP(On-line Analytical Processing)workloads at the same time.We analyze the logical information that can be used to view maintenance between transactions.The base tables that are commonly updated in the transaction are formed into a slice.The increment of the entire slice can be directly obtained by combining the operations of these base tables.Compared with the traditional method of calculating view increments based on single-row operations,the transaction-granular view maintenance process performs multi-table incremental calculations in batches,which greatly improves the efficiency of multi-table join views.Moreover,we propose two optimization techniques: reducing the computational overhead by optimizing the expression of incremental computation and reducing the maintenance cost by avoiding invalid base table accesses.(3)Optimize IO consumption for view maintenance:Since the data accessed by transactions and analytical queries often overlap under HTAP workloads,we focus on optimizing the overall IO overhead of transaction processing,analytical query,and view maintenance.When the base table is modified,we only record the tasks of view maintenance instead of synchronously updating the view.These tasks are completed by reusing the IO results of subsequent transactions and queries.We design and implement a view maintenance strategy for the multi-table join view which supports IO sharing.By constructing the view graph to maintain the join relationship between the base tables,the transaction execution not only generates the incremental records for the base table,but also generates related view maintenance tasks based on the view graph.Therefore,subsequent transactions or queries can complete these tasks without increasing IO costs,thereby effectively reducing maintenance overhead.In addition,the incremental calculation method based on multi-version implementation also ensures the consistency of the view and the base table under asynchronous updates.To sum up,this dissertation deeply studies the asynchronous incremental view maintenance strategies under HTAP workloads.It designs,implements,and optimizes the storage structures and maintenance methods of views in different scenarios.First of all,for the widely used distributed LSM-tree architecture database,this dissertation designs and implements an efficient incremental view maintenance method.Then,this dissertation studies the techniques of accelerating view incremental calculation by taking the operations in a transaction as a whole,and further optimizes it through analyzing the formal incremental calculation expression.Finally,for the IO consumption problem of view maintenance in more general scenarios,this dissertation utilizes the IO resources of transaction processing and analytical queries in the view maintenance execution,which fundamentally reduces the cost of view maintenance.A large number of experiments prove the effectiveness of these proposed methods.In the future,selecting which views to respond to the analytical loads,how to take advantage of machine learning techniques to guide the execution of asynchronous view maintenance tasks,and how to recommend materialized views under the HTAP workloads are all worthy of further research.
Keywords/Search Tags:Incremental View Maintenance, Asynchronous View Maintenance, LSM-tree, HTAP, IO sharing
PDF Full Text Request
Related items