Font Size: a A A

Research On Database Index Selection Based On Learned Cost Estimator And Reinforcement Learning

Posted on:2023-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:J L GaoFull Text:PDF
GTID:2558306845499654Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,the surge increasing of information on the Internet has raised higher requirements to the query optimization ability of relational database,making database optimization an important research topic.With the development of deep learning technologies,applying deep learning for better query optimization ability becomes more and more popular.This thesis focuses on AI for DB,and mainly studies two important problems in database optimization: cost estimation and index selection.For cost estimation,traditional cost estimation and cardinality estimation models tend to make strict assumptions about the distribution of data in the database which are difficult to meet in real database,which often leads to a poor estimation quality.And existing deep learning-based cost estimation models cannot learn structure features from query plans effectively,and do not consider index features,so the performance can be further improved.For index selection,heuristicbased methods often utilize a greedy-like criterion,which may not find the best solution.Linear programming-based methods may have to lower the scale of solution space for acceptable running time.On the other hand,existing index selection methods tend to use the cost estimated by the database management system(DBMS)to evaluate indexes,which may also result in suboptimal solutions due to the limitations of the cost estimator in DBMS.To solve the problems in existing methods,this thesis proposes a new cost estimation model and a new index selection model for better query optimization ability.First,this thesis proposes a cost estimation model based on graph convolutional network(GCN)called GCNCost.To solve the problem that existing methods may not learning structure features accurately,GCNcost applies a query plan learning module based on graph convolution network which takes the adjacency matrix of query tree as a part to better learn its structural features.To solve the problem that the existing methods do not use index features,GCNcost introduces index features and utilizes an index selection module based on multi-dimensional attention to learn the local and global features of the index at the same time.Finally,GCNcost uses a regression layer to synthesize query plan features and index features and get the output.Experimental results on two commonly used public data sets show that GCNcost can make more accurate cost estimation than the baseline models.Second,this thesis proposes an index selection model based on Deep Q-Network(DQN)called Deep Index.To solve the problem that existing methods may not evaluate the indexes accurately,Deep Index utilizes GCNCost proposed in this thesis to evaluate indexes.To solve the problem that existing index selection policies may bring suboptimal solutions,Deep Index uses DQN to select indexes.Different from existing reinforcement learning-based methods,Deep Index considers the interactions among indexes by introducing the index overlap concept.The experimental results on two common datasets show that Deep Index can provide better and more stable index selection results compared with baseline models.Finally,in order to help the users to select indexes in real scenarios,this thesis develops an index selection system based on Deep Index.The system provides many functions around index selection.Because the cost estimation model in the system needs to collect training data for training,a query processing page is designed in the system to collect the historical query plan training cost estimation model while processing the users’ queries.Furthermore,considering that the existing historical query plan may not be enough to train the cost estimation model,users can choose to use GCNcost or traditional cost estimator for index selection on their own.
Keywords/Search Tags:Query optimization, Cost estimation, Index selection, Graph convolutional networks, Deep reinforcement learning
PDF Full Text Request
Related items