Architectures and optimizations for integrating data mining algorithms with database systems

Posted on:1999-08-16

Degree:Ph.D

Type:Dissertation

University:University of Florida

Candidate:Thomas, Shiby

Full Text:PDF

GTID:1468390014472576

Subject:Computer Science

Abstract/Summary:

Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for integrating mining with database systems. These alternatives include loose-coupling through a SQL cursor interface; encapsulation of the mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. First, we comprehensively study the option of expressing the association rule mining algorithm in the form of SQL queries. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the

Cache-Mine

option is superior, although the SQL-OR option comes a close second. Both the

Cache-Mine

and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than

Cache-Mine

We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and interoperability.; We further analyze the SQL-92 approaches with the twin goals of studying how best can a DBMS without any object-relational extensions execute these queries and to identify ways of incorporating the semantics of mining into cost-based query optimizers. We develop cost formulae for the mining queries based on the input data parameters and relational operator costs. We also identify certain optimizations which improve the performance. Next, we study generalized association rule and sequential pattern mining and develop SQL formulations for them there by demonstrating that more complex mining operations can be handled in the SQL frame work.; We develop an incremental association rule mining algorithm which does not need to examine the old data if the frequent itemsets do not change. Even otherwise, access to the old database can be limited to just one scan. We categorize the various kinds of constraints on the items that are useful in the context of interactive mining to facilitate goal-oriented mining. We show how the incremental mining technique can be adapted to handle constraints and certain kinds of constraint relaxation. We also show the applicability of the incremental algorithm to other classes of data mining and decision support problems. Finally, we identify certain primitive operators that are useful for a large class of data mining and decision support applications. Supporting them natively in the DBMS could enable these applications to run faster.

Keywords/Search Tags:

Mining, DBMS, SQL, Support

Related items

1	P2: A lightweight DBMS generator
2	Mode Conversion Between The Dbms Technology Applied Research
3	Research On Buffer Management For Flash-based DBMS
4	Design And Implementation Of Dynamic-tinning Of Resource Of The DBMS
5	The Design And Realization Of Database Encryption System
6	The Research On Maintenance And Decision Support System For Electric Power Plant Equipments Based On Data Mining
7	The Research Of The DBMS Generator Based On The Component
8	Research On Survival Secure DBMS And Its Key Technologies
9	Design And Implementation Of Parallel Encrypted DBMS Based On CryptDB
10	Association Rules Candidates To Support The Study Of The Frequency