Font Size: a A A

A Distributed Query Optimization Method Based On A Hierarchical Model

Posted on:2024-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2568307070450544Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the exponential growth of data information in modern society,information systems are widely used for information management and storage,limited by the capacity and performance of hardware and other resource limitations,traditional single-node data storage is increasingly difficult to meet business needs,distributed data storage came into being.In a distributed environment,data tables are stored in modules to different nodes,which may be located on different servers.When the business needs to process data queries across nodes,the selection of nodes will affect the final query time due to factors such as network bandwidth and the number of data tables exchanged between nodes.Therefore,in such a distributed query scenario,the selection of data nodes is extremely important.At the same time,after determining the specific query node,you can further optimize the query time by finding an excellent query plan on this node.The main contributions of this paper are as follows:1.For the scenario of distributed cross-node data query,a complete hierarchical solution(hierarchical model)is proposed to speed up the overall query time.The hierarchical model consists of node selection in the outer layer and single-node query optimization in the inner layer.2.The outer layer of the hierarchical model proposes to use the QPC_EXT cost formula to calculate the network cost of querying specific nodes in distributed scenarios,and at the same time use the iterative promotion algorithm to optimize the optimal query scheme in the iterative loop.In this paper,it is proposed to use the greedy algorithm to calculate and compare the original QPC to obtain the initial state in the iterative promotion algorithm,and determine the excellent query nodes without the need to find all node selection schemes and the iteration algebra is small.This approach also reduces the consumption of system resources while speeding up query times.3.The inner layer of the hierarchical model proposes to use deep reinforcement learning to determine the excellent query plan on specific nodes,compared with the query optimization scheme of reinforcement learning in the past,this paper does not require the participation of professional query optimizers.During the training process,the Sim2 Real learning method is proposed to estimate the relationship between the preliminary learning query and the cost by using cardinality in the simulation stage,which helps the parameters of the cost model to be better trained in the practice stage,and the timeout expiration time strategy is proposed to improve the training efficiency.4.In this paper,a score calculation formula is proposed in the inner layer of the hierarchical model,which calculates the score of the k query plans output by the model through Beam Search,and finds out the execution plan with the largest score,so that the model can generate multiple query plans in the future.
Keywords/Search Tags:Distributed, Reinforcement learning, Iterative improvement
PDF Full Text Request
Related items