Font Size: a A A

Research On Cardinality Estimation Of Multi-Attribute Queries Based On Local Depth Autoregressive Framework

Posted on:2024-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q W ChengFull Text:PDF
GTID:2568306920950789Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Query optimizer is one of the core components in relational database.As an important part of query optimization,the main task of cardinality estimation is to estimate the number of tuples returned by each operator in query execution plans.Only based on efficient and accurate estimation results can the query optimizer choose the optimal execution plan for query execution and shorten query response time.Because of complex correlations among query attributes,multi-attribute query cardinality estimation is still one of the most challenging problems in cardinality estimation.Cardinality estimation about multi-attribute query requires the construction of accurate joint distribution of multiple attributes in a limited space and response quickly in a short time.The existing technologies are mainly divided into two categories.One is the traditional method,which is mostly independent decomposition of multi-column data distribution based on,independence assumption.The other one is learning-based method,including query-driven approach and data-driven approach.Query-driven method mainly captures the mapping between query and real cardinality from query information.Data-driven method estimates cardinality by learning the distribution of the underlying data,which have better generalization performance than query-driven method.Because of excellent processing ability for multi-dimension data,which can effectively construct joint distribution of attributes,autoregressive cardinality estimators based on datadriven obtain more attention.Existing autoregressive cardinality estimators construct global networks for attributes with different correlations,lacking identification and differential modeling,which lead to complex model and low estimation efficiency.What’s more,most datadriven methods just rely on data information and lack of targeted correction to models using query information,resulting in insufficient estimation accuracy and inability to meet some practical requirements with high accuracy.To solve the above issues,we focus on the problem of the multi-attribute queries cardinality estimation,which has two challenges calling for designing new models.One is how to distinguish strongly associated attributes and weakly associated attributes and model them differently,so as to balance the accuracy and efficiency of multi-attribute query cardinality estimation?And the other is how to use both data information and query information to improve the estimation accuracy and robustness of the model?To address these challenges,a Local depth Autoregressive Framework(LAF)for multi-attribute queries cardinality estimation is proposed.LAF uses the local strategy for fine-grained modeling of strong and weak correlation attributes,and utilizes data information and query information to train the model comprehensively.To the best of our knowledge,LAF is the first cardinality estimation method that uses the local strategy to build autoregressive models.The main work and contributions of this thesis are summarized as follows.Firstly,a fine-grained modeling method is proposed to model strong and weak association attributes differently,which solves the problem that the accuracy and efficiency are difficult to balance in multi-attribute queries cardinality estimation.LAF judges the strong and weak correlation between attributes through mutual information,using autoregressive models to capture the joint distribution between strong correlation attributes and outputting local estimations,using lightweight regression model to capture the complex mapping between weakly correlated local estimations and real cardinality.The fine-grained modeling method captures the data distribution and mapping relationship,avoiding the complex learning among weakly correlated attributes,thus ensuring the estimation accuracy and improving the estimation efficiency.Information entropy is also used to determine the attribute order in autoregressive model,which further improves the estimation accuracy.Secondly,a comprehensive training scheme is presented which effectively utilizes data information and query information to solve the problem that the multi-attribute queries cardinality estimation model is difficult to balance accuracy and robustness.Based on the idea of simultaneously learning knowledge from data and query information,LAF puts the two types of information into different structures orderly.The local autoregressive model uses data information for unsupervised learning through maximum likelihood estimation,while the lightweight regression model amplifies the difference between data based on logarithmic transformation and uses query information for supervised learning.The comprehensive training scheme balances the estimation accuracy and robustness of LAF under multiple query sets with different data distribution.Thirdly,the excellent performance of LAF in solving the cardinality estimation problem of multi-attribute query is verified through extensive experiments conducted on multiple datasets.Sufficient single-table experiments are conducted on DMV,Census and IOV datasets,and comparative experiments are conducted on the attribute ordering of the autoregressive models and the importance of query information.Abundant multi-table experiments are conducted on IMDB and IOV-large datasets.We select multiple baseline models for comparative experiments and comprehensively evaluate the models from the perspectives of estimation accuracy,estimation efficiency and space occupation.The results show that LAF can achieve both efficient and accurate estimation results on multiple datasets.LAF has superiority in multiple indicators compared to the baseline methods,obtaining the estimation results with low tail error within shorter time and smaller space occupation.
Keywords/Search Tags:Query Optimization, Cardinality Estimation, Multi-attribute Query, Deep Autoregressive Model
PDF Full Text Request
Related items