The Research On Postgresql Statistics Estimation Based On Block-level Sampling

Posted on:2008-10-25

Degree:Master

Type:Thesis

Country:China

Candidate:J Chen

Full Text:PDF

GTID:2198330332481736

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer hardware, software technology, and the comprehensive application of computer system in all trades and professions, data has become valuable resource in all kinds of institutions. Nowadays, database system is very important for scientific research departments,governmental entities and enterprises etc. With time passing, the amount of data goes increasingly, while one of the most important characters of DBMS is maintaining one acceptable level about its built systems. The query optimizer of database is primary mechanism of holding this kind of function.There are two kinds of query optimizers, the cost-based and rule-based optimizer. Most of the query optimizer of current commercial DBMS is the cost-based optimizer. The advantage of the cost-based optimizer comparing with the rule-based one is that the cost-based can estimate query cost based on the special information of the database objects, so the DBMS can choose the optimal execute path for the query, while the special information is the statistics named and gathered by DBMS. As a result, the precision of statistics estimation has prominent influence to the result of query cost estimation, and also has crucial influence to the quality of optimizer. DBMS can gather the statistics by estimating or computing, the former has high precision but high cost, especially for analyzing large objects, it may increase the system load; the latter has lower precision, but less cost and less load, even analyzing the large objects.This paper is primary based on the open source DBMS PostgreSQL, analyzing the source codes about query optimizer and how DBMS gathering statistics, and it emphasizes the histograms and distinct-value which are crucial statistics to query cost estimation, developing the way that uses computing and estimating to gather the statistics in the DBMS. It uses block-level sampling technology in sampling estimation, and uses cross-validation algorithm that is based on histograms to lessen the samples data bias below the threshold value which is destined for, then constructs the equi-depth histogram or value-based histogram on samples, and saves them into system table which can be used by the optimizer. When estimate the Distinct-value, because the layout of the data sets would bring bias on samples, we use hybrid-based estimation to resolve the problem which is arose by lost of the fl value(the number of the value only appears one time).Through the experiment under Linux AS3 and PostgreSQL8.1, it proves that the resolution in this paper is fit for PostgreSQL, and also improves the efficiency in gathering statistics with guaranteeing the estimating precision.

Keywords/Search Tags:

Query optimize, Statistics, Cost estimation, Histogram, Cross-valida-tion, Distinct-value

PDF Full Text Request

Related items

1	The Research On PostgreSQL Statistics Estimation Based On Block-level Sampling
2	Research And Improve On The Query Optimize Of MySQL
3	The Research On Selectivity Estimation Based On Xpath Path Expression
4	The Research On Selectivity Estimation Based On XPath Path Expression
5	Design And Implementation Of Cost Estimation Model In Da Meng DBMS
6	Research Of Image Enhancement Based On Probability Theory
7	Research On Spatial Query Optimization
8	A Study On Key Techniques Of Aggregate Query In Wireless Sensor Networks
9	Research On Cost Model For P2P-based Distributed Spatio-Temporal Indexing Range Query
10	Research And Implementation Of Software Cost Estimation Method Based On Constructive Cost ModelⅡ