Approximate algorithms for data warehousing and data mining

Posted on:2002-12-03

Degree:Ph.D

Type:Dissertation

University:George Mason University

Candidate:Wu, Xintao

Full Text:PDF

GTID:1468390011490384

Subject:Computer Science

Abstract/Summary:

A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. Since in practical situations, cubes can require a large amount of storage, and cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise.; We develop and implement Quasi-Cubes system which model regions of the core cuboid and employ these models to estimate the values of the individual cells. The system also retain all the cell values whose estimations are farther away from the real value by more than a pre-established threshold to avoid incurring in large errors by the estimation. This threshold becomes the guarantee of the approximate answer has. We store the model parameters (for each modeled region of the cuboid) along with the retained cells to process the queries.; To implement Quasi-Cubes, we develop some algorithms such as partition algorithm which divides the whole cube space into chunks, an efficient way to compute, loglinear models [Agr96] which describe the dense chunks (here we focus on how to choose a concise model and make the algorithm scalable to large data cubes), a row shuffling based clustering algorithm which aims to increase the density of dense chunks. We also extend Quasi-Cubes to online approximate query system to decrease the latency of the users' queries. Some data mining techniques also benefit from Quasi-Cube, such as Exploratory Data Analysis (EDA) etc.

Keywords/Search Tags:

Data, Approximate, Cube, Algorithm

Related items

1	Techniques Research For Data Cube Compression
2	Research On Approximate Query Processing Techniques In The DataWarehouse
3	Research And Implementation Of Construction And Query Techniques Of Histogram Data Cube Based On Hadoop
4	Data Cube Implementation Of Dimension Frequent Itemset
5	Research And Implementation Of Distributed Cube Distributed Storage And Construction Algorithm
6	OLAP Algorithm Research Based On Dimension Hierarchy For Data Cube
7	Research And Implementation Of Building Data Cube Based On Mapreduce
8	The Online Mining Of Data Cube Gradient
9	Research On Zero-Watermarking Algorithm For Data Cube
10	Research On Parallelled Data Cube Computing Method Based On Multi-core CPU