Font Size: a A A

Statistical Information Management Technology For Self-government Database

Posted on:2007-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J LiFull Text:PDF
GTID:1118360182493819Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of web and information technologies, database systems are become more and more complex and managing such systems are become more and more expensive. Under this situation, Self-Managing databases are proposed. Main technologies on this area are automatic indices/materialized views/table partitioning recommendation, automatic statistics recommendation and maintenance, etc. Since statistics are very sensitive to changes of data, very important in choosing optimal query plans, and their accuracy has deep impact on the efficiency of query processing, automate the management of statistics is of special value.Current work on automatic statistics management mainly uses a background routine to do data scan or sampling, which will lower down the efficiency of query processing, so these routines can only run offline or when system is not busy. On the other hand, due to shortcomings of maintenance tactics and limited feedback information, some feedback-based approaches are not sensitive to query workload changes, and perform poorly when data skew is high. In this paper, we propose a framework named Self-Adaptive Statistics Management (SASM), which uses feedback information to recommend and update statistics in a self-learning way and uses characterics of query plans to get more detailed feedback information for automatic statistics maintenance.It can greatly improve accuracy of statistics while has low interference to normal query processing.Statistics collected by current approaches for automatic statistics collection are very limited and the collection cost is very high. To overcome these shortcomings, we propose Plan-based statistics collection. It collects statistics for data distributions for attributes using characteristics of different query plans, such as index scan and sorting. In this way, not only more detailed statistics can be obtained, but also the efficiency for statistics collection can be improved.To avoid ruining former refining results by later ones, and to avoid high estimation error when data skew is high, we propose Self-Learning Histograms, SLH for short. It can learn its errors from query feedback, and can improve its accuracy by correcting such errors. It remember query feedback history using a simple coding for query feedback information, and in this way it can not only avoid ruining results of former adjustments by later ones, but also can deduce more statistics. When available memory is short, SLH releases some buckets through inner and global memoryrefroming.The global memory reforming can distribute more memories to essential histograms and reclaim memories from nonessential histograms, and in this way, SASM recommends its statistics for different workloads. In this means, SLH can overcome shortcomings of existing approaches that isn't sensitive to workload changes and can't use spaces efficiently.The accuracy of statistics depends not only on self-structure and maintenance tactic, but also depends on the cost estimation tactic of query optimizer. Tranditional estimation tactic makes no difference between refined statistics and outdated statistics in histogrms, which may lead to estimation error when update operation is frequent. We propose rule-based cost estimation, which uses query feedback information to do cost estimate. Rules mean query feedback information, which can reflect updates to data. Proper cost estimating tactic using rules may lower down estimation error. We present two tacktics in SASM: cost estimation based on the most-similar rule and cost estimation based on all basic rules, which improve the accuracy of estimation of query output size.We developed a protype system for SASM, and did plenty of tests to verify the efficiency of SASM. Experiment results show that statistics maintained by SASM can adapt well to data distribution and workload changes. Their have high accuray and acceptable maintaining cost.
Keywords/Search Tags:Self-Managing Database, Query Optimization, Statistics, Self-Learning Histograms
PDF Full Text Request
Related items