Learning-based Multi Attribute Cardinality Estimation

Posted on:2024-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Hong

Full Text:PDF

GTID:2568307052996369

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

The optimizer in the database selects the cheapest plan among multiple query plans to execute.Cardinality estimation has a non-negligible importance in the cost of the underlying query plan.In modern database systems,assumptions such as predicate independence are mostly used to provide cardinality estimates.Although traditional methods based on histograms can provide fast estimates,the estimates obtained from them are not accurate,which may induce the optimizer to choose Suboptimal query plan.Based on this,this paper proposes a machine learning scheme that applies the machine learning model to the cardinality estimation problem to improve the inaccurate cardinality estimation problem in the database.This paper firstly realizes the workload generation for cardinality estimation,then proposes an improved multi-attribute cardinality estimation model,and finally conducts a comparative experiment with other methods,and presents a process of integrating a cardinality estimation model with a modern database.The details are as follows:(1)Implemented a diverse workload suitable for cardinality estimation problems.Unlike other workloads generated by rules,the cardinality estimation-oriented workload discards assumptions such as predicate independence,and combines the attributes of multiple columns in a single table,which has a wide distribution and correlation,and is suitable for production use.Each workload is composed of a union of query selection blocks,where a query block is governed by its query center and extent size.(2)An improved multi-attribute cardinality estimation algorithm is proposed.The model learns complex regression functions through training data,and iteratively trains and combines tree-like ensembles and neural networks to improve feature extraction and regression capabilities.Considering the optimizer’s requirement for low latency,a lightweight model is leveraged for fast learning and high-accuracy predictions,while generating estimates in milliseconds.(3)The improved model is used to conduct comparative experiments and dynamic experiments with traditional databases and other models based on learning methods.The gap between the estimated cardinality value and the true cardinality value is described as an experimental accuracy measure.In addition,the structure of My SQL database optimizer is analyzed and an application scenario integrated with modern databases is proposed to combine the actual optimizer for plan selection.The experimental results in this paper show that the improved multi-attribute cardinality estimation model has the most outstanding performance in the same dataset and workload,achieving the highest accuracy,and can achieve a lower error value even at a shorter update time,which is a Subsequent application practice provides important guarantees.

Keywords/Search Tags:

Cardinality estimation, machine learning, optimizer, database system, predicate selectivity

PDF Full Text Request

Related items

1	Research On Cardinality Estimaton Based On Attention Mechanism
2	Cardinality Estimation Based On Multi-Feature Divided Composite Model
3	Research Of Selectivity Estimation Algorithm For String Predicates Based On Modiifed PST
4	Deep Autoregressive Model For Cardinality Estimation
5	Research And Implementation Of Database Cardinality Estimation Based On Causal Inference
6	SPARQL Query Optimization Based On Predicate Selectivity Estimation
7	New Approaches To Selectivity Estimation In Database Optimization
8	Study On Cardinality Estimation Method Based On Multi-Head Self-Attention Mechanism
9	Predicate Compiler Technology And Deep Code Optimization
10	Research On XML Cluster Storage & Selectivity Estimation Of Path Expression