Font Size: a A A

Extending association analysis

Posted on:2006-10-26Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Steinbach, Michael ScottFull Text:PDF
GTID:2458390008470522Subject:Computer Science
Abstract/Summary:
Data mining is a relatively new area of data analysis that has arisen in response to new data analysis challenges, such as those posed by massive data sets or non-traditional types of data. Association analysis, which seeks to find patterns that describe the relationships of attributes (variables) in a binary data set, is an area of data mining that has created a unique set of data analysis tools and concepts that have been widely employed in business and science. This thesis extends association analysis to address some well known problems and to define some new and potentially more useful types of association patterns. More specifically, the contributions of this thesis fall into three main areas: (1) creating a framework that allows association analysis to be directly applied to non-binary data and non-traditional association patterns, (2) creating an approach for defining a wide variety of new and potentially more useful association patterns, and (3) introducing a new association pattern, the support envelope, that is useful for the exploratory data analysis of association patterns. The extension of association analysis to non-binary data and non-traditional association patterns is accomplished by generalizing the notions of support and confidence, which are measures traditionally used to evaluate the strength of association patterns for binary data. The creation of new types of association patterns is based on a technique that can create a new association measure from any pairwise measure of association or proximity. Since there are many such measures, a wide variety of new association patterns can be created. Finally, support envelopes can be used to provide a global view of the structure of the association patterns in a data set, including association patterns with low support that are typically difficult to detect. Furthermore, using support envelopes, the association structure of a data set can be represented graphically as a two-dimensional scatter plot, a feature that is useful in the exploratory analysis of association patterns.
Keywords/Search Tags:Association, Data, New, Useful
Related items