Font Size: a A A

Novel techniques for data warehousing and online analytical processing in emerging applications

Posted on:2007-03-29Degree:Ph.DType:Dissertation
University:State University of New York at BuffaloCandidate:Cho, MoonjungFull Text:PDF
GTID:1458390005491232Subject:Computer Science
Abstract/Summary:
A data warehouse is a collection of data for supporting of decision making process. Data cubes and on-line analytical processing (OLAP) have become very popular techniques to help users analyze data in a warehouse. Even though previous studies on a data warehouse and data cube have been proposed and developed, as new applications emerging, there are still technical challenges which have not been addressed well.;We propose effective and efficient solutions to the challenging problems in the areas of (1) mining iceberg cube from multiple tables, (2) online answering ad-hoc aggregate queries on data streams, and (3) warehousing pattern-based clusters.;Firstly, we argue that the materialized base table assumption in most of the previous studies on computing iceberg cubes is often infeasible in practice. Instead, a data warehouse is often organized with multiple tables in schemas such as star schema, snowflake schema, and constellation schema. We propose a novel approach to compute an iceberg cube from multiple tables in a data warehouse in order to avoid costly materialization of a base table. Secondly, it is infeasible to compute a full data cube for answering ad-hoc aggregate queries on data streams due to a rapid data input and the huge size of data. We develop a new method to answer online ad-hoc aggregate queries on data streams, which is to maintain and index a small subset of aggregate cells on a designed data structure. Last, we extend the data warehousing and OLAP techniques to tackle pattern-based clusters. We propose an efficient method to construct a data warehouse of non-redundant pattern-based clusters.
Keywords/Search Tags:Data warehouse, Data warehousing, Analytical processing, Iceberg cube from multiple tables, Pattern-based clusters, Answering ad-hoc aggregate queries, Data cube, Techniques
Related items