Research On Approximate Query Processing Techniques In The DataWarehouse

Posted on:2003-07-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Feng

Full Text:PDF

GTID:1118360185996936

Subject:Computer application technology

Abstract/Summary:

Today's Decision Support System applications need pose very complex queries to data warehouse. The huge amounts of data and the complexity of the query make the query to take a very long time to execute and produce exact answers. Due to the exploratory nature of many DSS applications, they can tolerate small errors in query results in return for large reductions in response times. So approximate query processing has recently emerged as a visible solution for dealing with the huge amounts of data and the high query complexity.There are a number of scenarios in which an exact answer may not be required, and a user may in fact prefer a fast, approximate answer. For example, during a drill-down query sequence in ad-hoc data mining, initial queries in the sequence frequently have the sole purpose of determining the truly interesting queries and regions of the database. Providing(reasonably accurate) approximate answers to these initial queries gives users the ability to focus their explorations quickly and effectively, without consuming inordinate amounts of valuable system resources. Many aggregate queries need not the precision to last decimal. In this dissertation, we deal with the approximate query processing in data warehouse and propose a cluster-based approximate query processing method(CAQP) according to the character of data in the datawarehouse.This dissertation first deals deeply with the clustering techniques and puts forward a new density-based and grid-based clustering algorithmâ€”SCARG. The main idea of SCARG is that it divides the data space into many rectangle cells, if the density of a cell is greater than a threshold, then this area is a dense cell. All the connected cells constitute a cluster. SCARG finds primary portions of every cluster very fast and then uses refinement techniques to build good clusters. We also show the results of our experiments on both synthetic data and real data of SEQUOIA 2000 benchmark which validate the effectiveness and performance of SCARG compared with other famous clustering algorithms(DBSCAN, CLARANS). To improve the scalability, we also present PSCARG(Parallel SCARG) based on data partitioning.

Keywords/Search Tags:

Data Warehouse, Approximate Query Processing, Clustering Analysis, Data Cube, Data Compression

Related items

1	Design And Implementation Of Online Marketing Data Analysis Platform Based On The Materialized Data Cube
2	The Query Analysis System, The Revenue And Expenditure Based On Data Warehouse Technology
3	Research On Approximate Query Algorithm For Real-time Analysis Of Massive Data
4	Research On The Processing Of Multidimensional Data And Related Querying Technology In Data Warehouses
5	Research On The Compression-based Approximate Query Method For Massive Incomplete Data
6	Research On Approximate Query Processing Technology Based On Multidimensional Analysis Of Big Data
7	Research Of Approximate Query Processing Technology For Large Scale Data
8	Research On The Efficient Materialization And Fast Query Of Condensed Data Cube
9	Research And Implementation Of Online Multiple Aggregation Query System Over The Big Data
10	Data Warehouse In The Erp Application