Font Size: a A A

Data warehouse operational design: View selection and performance simulation

Posted on:2006-06-27Degree:Ph.DType:Dissertation
University:The University of ToledoCandidate:Agrawal, Vikas RFull Text:PDF
GTID:1458390008470764Subject:Business Administration
Abstract/Summary:
Decision support systems are a key to gaining competitive advantage. Many corporations have built or are building unified decision-support databases called data warehouses on which decision makers can carry out their analysis. A data warehouse is a very large data base that integrates information extracted from multiple, independent, heterogeneous data sources to support business analysis activities and decision-making tasks. The data that is likely to be in demand is generally pre-computed and stored ahead of time at the data warehouse in the form of materialized views . This dramatically reduces execution time of decision support queries from hours or days to minutes or even seconds.; There are many architectural issues concerning the efficient design of a data warehouse. This dissertation studies in depth three important issues. The first issue addressed is the Materialized View Selection (MVS) problem, which is the problem of choosing an optimal set of views to materialize under resource constraints. We have formulated interesting bottleneck versions of this problem and presented the 0--1 Integer Programming models as well as the heuristic procedures. Performance analysis of the heuristic procedures is also presented.; Formulation of the MVS problem requires knowledge of the number of rows in each view in a given lattice structure, which refers to views and their interrelationships for a given set of dimensions. Counting actual number of rows present in each view takes considerable time. The second issue addressed in this dissertation focuses on the statistical sampling techniques applied to data warehouses to estimate number of rows in each view in a given lattice structure. We have shown that the application of sampling techniques results in significant time savings without compromising on accuracy.; The third issue deals with modeling the behavior and performance of a data warehouse system using simulation. We implemented the model in ARENA. The model enables a data warehouse manager to walk through various scenarios to investigate the synergy among various system components and to identify areas of inefficiencies in the system. This could also help improve overall performance of the data warehouse system.
Keywords/Search Tags:Data, Performance, System, View
Related items