Font Size: a A A

Data Mining Using Contrast-sets: A Comparative Study

Posted on:2012-05-12Degree:M.ScType:Thesis
University:University of Alberta (Canada)Candidate:Satsangi, AmitFull Text:PDF
GTID:2468390011964306Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Comparative analysis is an essential part of understanding how and why things work the way they do. How have the rich fared in comparison to the poor in the last decade? Why do we find more men in Science and Engineering as compared to women? Do postgraduate degree holders really earn more money than those with an undergraduate degree? What do some customers prefer to buy online vs. those that do not? What factors contribute to pre-term births? Why are some students more successful than others? All the above questions require comparison between various classes. Contrast-set mining was first proposed as a way to identify those attributes that significantly differentiate between various groups (or classes) for the case of discrete data. Contrast-set mining has now been applied in every conceivable field to find contrast-sets (conjunction of attribute-value pairs) that aid in differentiating between different groups; however no clear picture seems to have emerged regarding how to extract the contrast-sets that discriminate most between the classes. Various interestingness-measures and usefulness-measures have been proposed in the form of different contrast-set mining techniques claiming to find more meaningful contrast-sets than those found by the previous technique. It has been proven in literature that contrast-set mining is a special case of rule discovery task; in this thesis we try to address the problem of finding meaningful contrast sets by applying a methodology that is based on the foundations of contrast-set mining -- Association Rule Mining. Amongst the many surprising results that we obtain, we also report a family of contrast-sets that were previously not known in the literature. We also show as to why we should expect contrast-sets of only a certain kind for any data. Finally, we present and compare the results of our experiments with the well known algorithm for contrast-set mining -- STUCCO.
Keywords/Search Tags:Mining, Contrast-set, Data
PDF Full Text Request
Related items