Supervised rule discovery for rare events in mixed process data

Posted on:2006-06-03

Degree:Ph.D

Type:Dissertation

University:Arizona State University

Candidate:Berrado, Abdelaziz

Full Text:PDF

GTID:1458390008454555

Subject:Engineering

Abstract/Summary:

Predictive technology enables one to predict the future in time to take actions to improve results. A plethora of classification procedures have been developed for this purpose and are proven to have a strong predictive power. Their interpretability, however, is unsatisfactory; they frequently fail to uncover the predictive structure of the problem, which is key to taking actions and improving results.; Association rules mining is one of the major successes of data mining. It is widely used to find interesting associations between attributes from massive high-dimensional categorical feature spaces. This work suggests a new approach for finding highly actionable rules, using existing association rules mining algorithms, to explain the occurrence of events in mixed high-dimensional manufacturing data. This research suggests finding association rules where the consequent matches the event of interest. Solutions to several limitations to association rules mining from process data are addressed in this research, namely the large number of rules and the heterogeneity of the predictors.; The high dimensionality of massive data results in the discovery of a large number of association rules, many of which are redundant and contained in other rules. The sparseness of the data affects the redundancy and containment between the rules. A new methodology for organizing and grouping the association rules with the same consequent is provided.; Supervised association rules mining from a heterogeneous data space, requires discretizing the continuous attributes. This step should be carried out with a minimum information loss. A novel discretization algorithm called Random Forests Discretizer (RFDisc) is introduced in this work. It derives its ability in conserving the data properties from the Random Forests learning algorithm.; Finally, supervised association rules along with their corresponding metarules are used for clustering in a categorical feature space. Clustering algorithms partition data sets into groups of objects such that the pairwise similarity between objects within the same cluster is higher than those assigned to different clusters. Defining a similarity measure becomes challenging in the presence of categorical data. This work introduces an algorithm called Supervised Clustering with Association Rules (SCAR), for clustering massive high dimensional categorical data.

Keywords/Search Tags:

Data, Association rules, Supervised, Categorical, Clustering

Related items

1	Research On The Optimization Of Association Rules
2	A Study On Clustering Algorithms For Categorical Data With Applications
3	Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement
4	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
5	Studies On Clustering Algorithms For Categorical Data
6	Association Rules Mining And Its Applications In Microarray Gene Expression Data
7	Research And Implementation Of Clustering Method For High Dimensional Categorical Data
8	Research On The Algorithm Of Telecom Business Association Rules
9	Study Of Algorithms For Clustering Categorical Data
10	Research On Web Information Retrieval Based On Data Mining