Font Size: a A A

Approximate algorithms for data warehousing and data mining

Posted on:2002-12-03Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Wu, XintaoFull Text:PDF
GTID:1468390011490384Subject:Computer Science
Abstract/Summary:
A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. Since in practical situations, cubes can require a large amount of storage, and cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise.; We develop and implement Quasi-Cubes system which model regions of the core cuboid and employ these models to estimate the values of the individual cells. The system also retain all the cell values whose estimations are farther away from the real value by more than a pre-established threshold to avoid incurring in large errors by the estimation. This threshold becomes the guarantee of the approximate answer has. We store the model parameters (for each modeled region of the cuboid) along with the retained cells to process the queries.; To implement Quasi-Cubes, we develop some algorithms such as partition algorithm which divides the whole cube space into chunks, an efficient way to compute, loglinear models [Agr96] which describe the dense chunks (here we focus on how to choose a concise model and make the algorithm scalable to large data cubes), a row shuffling based clustering algorithm which aims to increase the density of dense chunks. We also extend Quasi-Cubes to online approximate query system to decrease the latency of the users' queries. Some data mining techniques also benefit from Quasi-Cube, such as Exploratory Data Analysis (EDA) etc.
Keywords/Search Tags:Data, Approximate, Cube, Algorithm
Related items