Font Size: a A A

Research On Approximate Query Processing Techniques In The DataWarehouse

Posted on:2003-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:1118360185996936Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today's Decision Support System applications need pose very complex queries to data warehouse. The huge amounts of data and the complexity of the query make the query to take a very long time to execute and produce exact answers. Due to the exploratory nature of many DSS applications, they can tolerate small errors in query results in return for large reductions in response times. So approximate query processing has recently emerged as a visible solution for dealing with the huge amounts of data and the high query complexity.There are a number of scenarios in which an exact answer may not be required, and a user may in fact prefer a fast, approximate answer. For example, during a drill-down query sequence in ad-hoc data mining, initial queries in the sequence frequently have the sole purpose of determining the truly interesting queries and regions of the database. Providing(reasonably accurate) approximate answers to these initial queries gives users the ability to focus their explorations quickly and effectively, without consuming inordinate amounts of valuable system resources. Many aggregate queries need not the precision to last decimal. In this dissertation, we deal with the approximate query processing in data warehouse and propose a cluster-based approximate query processing method(CAQP) according to the character of data in the datawarehouse.This dissertation first deals deeply with the clustering techniques and puts forward a new density-based and grid-based clustering algorithm—SCARG. The main idea of SCARG is that it divides the data space into many rectangle cells, if the density of a cell is greater than a threshold, then this area is a dense cell. All the connected cells constitute a cluster. SCARG finds primary portions of every cluster very fast and then uses refinement techniques to build good clusters. We also show the results of our experiments on both synthetic data and real data of SEQUOIA 2000 benchmark which validate the effectiveness and performance of SCARG compared with other famous clustering algorithms(DBSCAN, CLARANS). To improve the scalability, we also present PSCARG(Parallel SCARG) based on data partitioning.
Keywords/Search Tags:Data Warehouse, Approximate Query Processing, Clustering Analysis, Data Cube, Data Compression
PDF Full Text Request
Related items