Font Size: a A A

Research And Implementation Of Adaptive Multi-table Join Cardinality Estimation Optimization Method Based On Correlation Analysis

Posted on:2024-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChenFull Text:PDF
GTID:2568306923452274Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cardinality estimation is one of the most important and challenging problems in query optimization.The estimation results are used as the input to the cost model,which takes the task of selecting an optimal execution plan from the planning space.Multi-table join cardinality estimation is one of the most difficult problems to solve in cardinality estimation.Taking the PostgreSQL database as an example,when the number of tables reaches 6,over 95%of the join cardinality estimation results in PostgreSQL have varying degrees of error.Especially,the median cardinality prediction error can reach over 100 times.The existing methods for estimating the cardinality of multi-table join can be divided into traditional methods and learning methods.The former uses methods such as summarization and sampling to estimate the cardinality of join queries,while the latter uses machine learning methods such as regression models to predict the cardinality estimation results.However,these methods have following shortcomings.Firstly,traditional methods basically rely on simple assumption of attribute independence,ignoring the "multi-table joint query correlation"between data.In addition,most learning methods depend on overly large model to learn the joint distribution of multiple tables,resulting in a longer model planning time and a lack of generalization ability.Last but not least,none of these methods make effective use of the large amount of real query execution statistics,such as the true cardinality.This means they miss the opportunity to improve accuracy based on feedback.In response to the shortcomings of the existing research,this thesis combines the needs of a domestic database cooperation project to study the optimization method of adaptive multitable join cardinality estimation based on correlation analysis.The research is dedicated to addressing the following challenges.Firstly,how to capture the semantics correlation of data column values and represent string predicates efficiently and comprehensively?In addition,how to build a lightweight multi-table join cardinality estimation model that is able to capture inter-table correlation with high generalization ability?Last but not least,how to effectively use query execution statistics to improve the accuracy of cardinality estimation models?In response to these challenges,an adaptive multi-table join cardinality estimation optimization method based on correlation analysis called AMT-JcECR is proposed by this thesis.It uses a word vector model to capture and quantify the correlation of query statements,and uses a multiset convolutional network to learn and estimate cardinality from query correlation features.It adaptively collects query execution statistical information and inputs it into the cardinality estimation model.Based on the feedback,the estimation model continuously learns and adjusts itself.The main work and contributions of this thesis are as follows:1.LVC,a row vector encoding method based on the word vector model is proposed by this thesis,which is able to capture the semantics correlation of data column values and express effectively the traditional string predicate.Multi-table join queries can be divided into sets of tables,join sets,predicate sets,etc.Set semantics is used to represent query features and true cardinality labels.For a complex predicate set with strings,the base table row data is used as the word vector model corpus,and the predicate correlation learning and embedding representation are performed from the semantics,ultimately obtaining a query feature vector containing "multi-table joint query correlation".2.MCNCE,a multi-set convolutional network cardinality estimation model based on multi-layer perceptron is proposed by this thesis,which is committed to capture the inter-table correlation with high generalization ability of multi-tables join in cardinality estimation.MCNCE constructs a multi-set convolutional network model based on multi-layer perceptron,and uses the combination of sampling method and deep learning method from the feature data to capture the correlation between tables.By using LVC to input the query feature set encoding vector represented by set semantics into the corresponding deep learning system module,it helps to generalize it to unknown instances of the same structure,enabling the model to generalize to join queries on more tables.3.ACE,an adaptive cardinality estimation framework based on execution statistical information feedback is proposed by this thesis,which can effectively utilize feedback statistics in cardinality estimation.The core of ACE is the interaction between the cardinality estimation model MCNCE and the query optimizer.The estimation results of the cardinality estimation model should be injected into the database optimizer.Then,the execution engine executes the query to get the feedback statistics.Information such as the top n queries with the largest estimation error and their actual cardinality should be coded with the LVC method.The encoded information ought to be used as the update of the training set of the cardinality estimation model.In the above process,the model continuously improves the accuracy of the cardinality estimation.4.To verify the effectiveness and progressiveness of the proposed method,this thesis conducts a series of comparative experiments on the existing public datasets and the synthetic datasets.This thesis demonstrates through a combination of qualitative and quantitative analysis that the proposed method outperforms other baseline methods,and verifies that capturing semantics correlation in queries and adaptive query feedback are beneficial for multitable join cardinality estimation.At the same time,a series of experiments to illustrate the impact of different modules in each method on the results are set up by this thesis,to verify the effectiveness of each component.
Keywords/Search Tags:Multi-table Join, Cardinality Estimation, Correlation Analysis, Adaptive Query Feedback
PDF Full Text Request
Related items