High-performance on-line analytical processing and data mining on parallel computers

Posted on:2000-04-06

Degree:Ph.D

Type:Dissertation

University:Northwestern University

Candidate:Goil, Sanjay

Full Text:PDF

GTID:1468390014964063

Subject:Computer Science

Abstract/Summary:

Decision support systems are important in leveraging the information present in large scale data repositories in many scientific and business applications. Data analysis and data mining on these warehouses pose new challenges for traditional database systems. On-Line Analytical Processing (OLAP) and data mining operations require summary information on these data sets. Query processing for these applications require different views of data for analysis and effective decision making. The multi-dimensional data model is a natural and intuitive approach for such applications. Data mining techniques can be applied in conjunction with OLAP for an integrated solution. As data warehouses grow, parallel processing techniques need to be applied to enable the use of larger data sets and reduce the time for analysis, thereby enabling evaluation of many more options for decision making.; In this dissertation we focus on parallel processing techniques for scalable OLAP and data mining. A scalable parallel multi-dimensional infrastructure for OLAP integrated with data mining techniques like association rules and classification is designed and implemented. Multidimensional OLAP systems store data in a multidimensional structure on which analytical operations are performed. For large data sets and a large number of dimensions, multidimensional arrays are impractical and other efficient sparse data structures and techniques are required. We introduce a Bit-encoded sparse structure (BESS) for storage compression which allows aggregate operations on the compressed data. Pre-computed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications and data mining. We address the issues involved in parallel construction and maintenance of partial and full data cubes and answering OLAP queries and data mining tasks using them. In particular, issues relating to handling of large data sets, a large number of dimensions, sparse data structures, and parallelism are investigated. Algorithms are presented for our techniques which have been currently implemented on the IBM-SP2 parallel machine and can be ported to another parallel platform with minimal effort. Results show that our algorithms for OLAP and data mining on parallel systems are scalable to a large number of processors, large dimensions, and large data sets, providing a high performance platform for such applications.

Keywords/Search Tags:

Data mining, On-line analytical processing, Parallel, Large data sets, Applications, Systems, Sparse data structures

Related items

1	Data Warehouse And Data Mining In The Securities Brokerage Business Crm Applications
2	The Research Of On Line Analytical Mining Technology And Its Application
3	Application And Research On Association Rule Mining Algorithm In Large Data Sets
4	On Olam Technology In Higher Education Funding For Projects In Yunnan Province
5	Optimization Research On Relation On-Line Analytical Processing Based On Rough Sets
6	Application Of Data Warehouse And Data Mining Technology In Tax Administration System
7	Decision Support System In Telecom Ip Data Warehouse And Data Mining Research And Applications
8	Applied spatial data structures for large data sets
9	A model to integrate data mining and on-line analytical processing: With application to real time process control
10	The Design And Application Of CRM Systems Based On Data Warehouse