Font Size: a A A

Parallel Log Replay Optimization In HTAP System

Posted on:2022-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:J B SunFull Text:PDF
GTID:2518306776493564Subject:Management Science
Abstract/Summary:PDF Full Text Request
Now more and more database usage scenarios require the database system to support the workload of transaction processing and query analysis at the same time.A common solution is to support the workload of transaction processing and query analysis through OLTP system and OLAP system respectively.For most application scenarios,especially some HTAP workloads that require high real-time performance,the traditional ETL-based method and single thread sequential log replay method will not be able to meet these scenarios due to the need to synchronize data between the two systems.In order to meet the HTAP workload with high real-time requirements,we need to improve the data freshness of OLAP as much as possible.The synchronous data between the two systems mainly depends on the log.Parallel log replay is an effective method to improve the data freshness of OLAP.However,the conflict problem in the process of parallel log replay will affect the speed of log replay and the data freshness of OLAP.To solve this problem,the main content of this thesis is to optimize the parallel log replay process of HTAP system to improve the data freshness of OLAP.The main research contents and contributions of this thesis are as follows:(1)A fine-grained parallel log replay method based on Table-ID:When the log scheduling method based on Transaction-ID conflicts in the process of parallel log replay,it is generally blocked at the transaction granularity,which will affect the replay speed of the log.To solve this problem,this thesis selects the log scheduling method based on Table-ID,and only blocks a single log record in case of conflict in the replay process,which will not affect the replay of other conflict free log records in the transaction.(2)The data consistency guarantee mechanism under fine-grained replay method:Firstly,in order to ensure the data consistency between the replica node and the master node,a globally unique transaction submission queue is set in the replica node to ensure that the transaction submission order in the replica node is consistent with that of the master node.In addition,this thesis designs the row version ID field to solve the serialization problem that may occur in the process of parallel log replay.At the same time,by using the characteristics of mvcc,the query operation can find different data versions.(3)A dynamic allocation method of replay thread resources:In the log scheduling method based on Table-ID,with the change of workload in the master node,the amount of log record data corresponding to each table in the replica node may also change,and the replay progress of log records corresponding to each table may be inconsistent.To solve this problem,this thesis proposes to dynamically allocate replay thread resources according to the amount of log data corresponding to each table,so as to ensure that the replay progress of log records corresponding to each table is consistent.Relevant experiments show that the method of this thesis can effectively improve the log replay speed and data freshness in the replica node,and reduce the response delay of query operation.Compared with the log scheduling method based on Transaction-ID,the log replay speed can be improved by 50% and the query response delay can be reduced by70%.Through the comparative experiment with the traditional method based on TableID,the dynamic allocation method of replay thread resources in this thesis can effectively ensure that the replay progress of log records corresponding to each table is consistent.In this thesis,the parallel log replay optimization method in HTAP system can effectively improve the log replay speed,so as to improve the data freshness of OLAP and reduce the response delay of query operation.Therefore,this method is of great significance to support query analysis scenarios with real-time analysis.
Keywords/Search Tags:HTAP, log replay, conflict detection, resource allocation, data freshness
PDF Full Text Request
Related items