Font Size: a A A

Learning-based Multi Attribute Cardinality Estimation

Posted on:2024-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HongFull Text:PDF
GTID:2568307052996369Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The optimizer in the database selects the cheapest plan among multiple query plans to execute.Cardinality estimation has a non-negligible importance in the cost of the underlying query plan.In modern database systems,assumptions such as predicate independence are mostly used to provide cardinality estimates.Although traditional methods based on histograms can provide fast estimates,the estimates obtained from them are not accurate,which may induce the optimizer to choose Suboptimal query plan.Based on this,this paper proposes a machine learning scheme that applies the machine learning model to the cardinality estimation problem to improve the inaccurate cardinality estimation problem in the database.This paper firstly realizes the workload generation for cardinality estimation,then proposes an improved multi-attribute cardinality estimation model,and finally conducts a comparative experiment with other methods,and presents a process of integrating a cardinality estimation model with a modern database.The details are as follows:(1)Implemented a diverse workload suitable for cardinality estimation problems.Unlike other workloads generated by rules,the cardinality estimation-oriented workload discards assumptions such as predicate independence,and combines the attributes of multiple columns in a single table,which has a wide distribution and correlation,and is suitable for production use.Each workload is composed of a union of query selection blocks,where a query block is governed by its query center and extent size.(2)An improved multi-attribute cardinality estimation algorithm is proposed.The model learns complex regression functions through training data,and iteratively trains and combines tree-like ensembles and neural networks to improve feature extraction and regression capabilities.Considering the optimizer’s requirement for low latency,a lightweight model is leveraged for fast learning and high-accuracy predictions,while generating estimates in milliseconds.(3)The improved model is used to conduct comparative experiments and dynamic experiments with traditional databases and other models based on learning methods.The gap between the estimated cardinality value and the true cardinality value is described as an experimental accuracy measure.In addition,the structure of My SQL database optimizer is analyzed and an application scenario integrated with modern databases is proposed to combine the actual optimizer for plan selection.The experimental results in this paper show that the improved multi-attribute cardinality estimation model has the most outstanding performance in the same dataset and workload,achieving the highest accuracy,and can achieve a lower error value even at a shorter update time,which is a Subsequent application practice provides important guarantees.
Keywords/Search Tags:Cardinality estimation, machine learning, optimizer, database system, predicate selectivity
PDF Full Text Request
Related items