Font Size: a A A

Incomplete data mining: A rough set approach

Posted on:2008-04-11Degree:M.SType:Thesis
University:University of KansasCandidate:Ajayi, Temidayo BFull Text:PDF
GTID:2458390005480721Subject:Computer Science
Abstract/Summary:
A vast majority of datasets are incomplete, i.e., are affected by missing attribute values. This phenomenon makes it increasingly difficult to accurately diagnose a disease, evaluate a loan application, predict the outcome of an election, and other functions that a data mining application would be able to accomplish with a complete dataset. The fact is that in reality, complete data is difficult, if not impossible to acquire in most instances.; LERS, Learning from Examples based on Rough Sets, is a universal tool that induces a set of rules based on examples, and classifies new unseen examples using the set of rules induced. LERS has undergone many modifications. This work utilizes one of the modules of LERS, MLEM2, to study data with missing attribute values. To demonstrate the universality of LERS, the data used for this research spans different fields of work such as politics and medicine.; Nine strategies including concept lower and subset lower and upper approximations are used, in combination with three interpretations of missing attribute values: lost, do-not-care conditions, and attribute-concept conditions. The results obtained from the experiments reveal that there is no universal best strategy to deal with missing attribute values.
Keywords/Search Tags:Missing attribute values, Data, LERS
Related items