Font Size: a A A

Distribution-density Based Histograms For Selectivity Estimation

Posted on:2011-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y C FengFull Text:PDF
GTID:2178360308954099Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
For query processing and optimization, it is one of important issues to estimate query selectivity, and histograms can be an effective tool for accurate query selectivity estimation. In most database systems, the task of query optimization is to choose an effective plan. Best plan selection requires accurate estimates of the costs of alternative plans. One of the most important factors that affect plan cost is selectivity. Therefore, in most cases, the accurate of selectivity estimates directly impact on the choice of the optimal plan. So far there have been a lot of ways in selectivity estimates. In most methods, however, extra I/O accesses to the database are required for the very purpose of collecting statistics. This procedure might be expensive, and should be done off line or when the system is light-loaded. In addition, most methods are only effective for low-dimensional data, for example, some histograms is effective, when the dimension of data is no more than three-dimensional data. With the increase of the data dimension, performance of the histograms will rapidly decline. It is a challenging problem to partition high dimensional spaces; therefore, the techniques of building histograms for selectivity estimation in high dimensional spaces are still much concerned.In this paper, we present a new method to establish histograms for selectivity estimation, the main idea of the method is based on the distribution density of the domain for a database to build a histogram so that the distribution of each bucket in the histogram is uniform or nearly. Extensive experiments are carried out to measure the performance of this method, and the results indicate that it is highly competitive with existing techniques for low-dimensional (2, 3, and 4) data, and it is also effective for high-dimensional (25 and 104) data.
Keywords/Search Tags:Selectivity Estimation, Distribution Density, Histogram, N-hyper rectangle
PDF Full Text Request
Related items