Statistics on query expressions in relational database management systems

Posted on:2004-07-08

Degree:Ph.D

Type:Thesis

University:Columbia University

Candidate:Bruno, Nicolas

Full Text:PDF

GTID:2468390011471480

Subject:Computer Science

Abstract/Summary:

The query optimizer is the component in a relational database system that identifies efficient execution plans for input queries. Modern optimizers generally explore many alternative query plans in a cost-based manner. Specifically, the resource consumption and associated cost of each candidate plan is estimated, and the plan with the least expected cost is chosen for execution. The cost estimation for a plan depends on several factors, including resource availability during execution, the specific operators that compose the plan, and the size of intermediate results that would be generated during the plan execution. Among these factors, the intermediate-result size (or cardinality) estimation is the main source of inaccuracies during optimization: cardinality estimation typically relies on several simplifying assumptions that often do not hold in practice. Optimizers then sometimes base their decisions on inaccurate information and produce low-quality execution plans. To address this limitation, in this thesis we introduce the concept of SITS, which are statistics built on query expressions. SITS directly and accurately model intermediate results in a query execution plan, and therefore avoid error-prone simplifying assumptions during cardinality estimation. If optimizers have appropriate SITS available during optimization, the resulting query plans can be dramatically better than otherwise. Although SITs are a fairly simple concept, challenging problems need to be addressed before SITs can be seamlessly integrated into modern relational database systems. In this thesis we study three important challenges associated with SITS. First, we show how to modify query optimizers to exploit the additional statistical information provided by SITs without significantly increasing optimization time. Second, we study a spectrum of alternatives to create SITs, which balance efficiency of construction and accuracy of the resulting estimators. Third, we present techniques to recommend a small but highly beneficial set of SITs to materialize in a database system for a given query workload. In summary, we address the main obstacles for enabling SITs for optimization, namely which SITs to build, how to build them, and how to exploit them during optimization. Overall, SITs constitute a well-founded approach for dealing with complex data correlations, and positively impact the efficiency of relational database systems.

Keywords/Search Tags:

Relational database, SITS, Query, Plan, Execution

Related items

1	Based On Historical Query Relational Database Query Optimization Keyword Research Questions
2	Relational Database Query Optimization Technology To Achieve
3	Optimization Of Query Algorithm For Distributed Relational Database
4	Design And Implementation Of Integrated Query Middleware About Relational Database And Non-Relational Database For Structure Safety Monitoring
5	Query Execution Plan Cache Based On DM Embedded Database
6	Research On Migration Algorithm From Traditional Relational Database To Non Relational Database
7	Research On XML Storage And Query Based On Relational Database
8	The Study And Examination Of Relational Database Ciphertext Query In DBAAS
9	Database models and query languages for relational data and metadata query processing
10	Research On Constraint Relational Model Of Non-Classical Relational Database In Query Optimization