Font Size: a A A

Supervised rule discovery for rare events in mixed process data

Posted on:2006-06-03Degree:Ph.DType:Dissertation
University:Arizona State UniversityCandidate:Berrado, AbdelazizFull Text:PDF
GTID:1458390008454555Subject:Engineering
Abstract/Summary:
Predictive technology enables one to predict the future in time to take actions to improve results. A plethora of classification procedures have been developed for this purpose and are proven to have a strong predictive power. Their interpretability, however, is unsatisfactory; they frequently fail to uncover the predictive structure of the problem, which is key to taking actions and improving results.; Association rules mining is one of the major successes of data mining. It is widely used to find interesting associations between attributes from massive high-dimensional categorical feature spaces. This work suggests a new approach for finding highly actionable rules, using existing association rules mining algorithms, to explain the occurrence of events in mixed high-dimensional manufacturing data. This research suggests finding association rules where the consequent matches the event of interest. Solutions to several limitations to association rules mining from process data are addressed in this research, namely the large number of rules and the heterogeneity of the predictors.; The high dimensionality of massive data results in the discovery of a large number of association rules, many of which are redundant and contained in other rules. The sparseness of the data affects the redundancy and containment between the rules. A new methodology for organizing and grouping the association rules with the same consequent is provided.; Supervised association rules mining from a heterogeneous data space, requires discretizing the continuous attributes. This step should be carried out with a minimum information loss. A novel discretization algorithm called Random Forests Discretizer (RFDisc) is introduced in this work. It derives its ability in conserving the data properties from the Random Forests learning algorithm.; Finally, supervised association rules along with their corresponding metarules are used for clustering in a categorical feature space. Clustering algorithms partition data sets into groups of objects such that the pairwise similarity between objects within the same cluster is higher than those assigned to different clusters. Defining a similarity measure becomes challenging in the presence of categorical data. This work introduces an algorithm called Supervised Clustering with Association Rules (SCAR), for clustering massive high dimensional categorical data.
Keywords/Search Tags:Data, Association rules, Supervised, Categorical, Clustering
Related items