Incomplete data mining: A rough set approach

Posted on:2008-04-11

Degree:M.S

Type:Thesis

University:University of Kansas

Candidate:Ajayi, Temidayo B

Full Text:PDF

GTID:2458390005480721

Subject:Computer Science

Abstract/Summary:

A vast majority of datasets are incomplete, i.e., are affected by missing attribute values. This phenomenon makes it increasingly difficult to accurately diagnose a disease, evaluate a loan application, predict the outcome of an election, and other functions that a data mining application would be able to accomplish with a complete dataset. The fact is that in reality, complete data is difficult, if not impossible to acquire in most instances.; LERS, Learning from Examples based on Rough Sets, is a universal tool that induces a set of rules based on examples, and classifies new unseen examples using the set of rules induced. LERS has undergone many modifications. This work utilizes one of the modules of LERS, MLEM2, to study data with missing attribute values. To demonstrate the universality of LERS, the data used for this research spans different fields of work such as politics and medicine.; Nine strategies including concept lower and subset lower and upper approximations are used, in combination with three interpretations of missing attribute values: lost, do-not-care conditions, and attribute-concept conditions. The results obtained from the experiments reveal that there is no universal best strategy to deal with missing attribute values.

Keywords/Search Tags:

Missing attribute values, Data, LERS

Related items

1	Research On Classification Algorithm Of Decision Tree For Missing Data Based On Variable Precision Rough Set
2	Researches On Imputation And Classification Of Incomplete Data Based On Variables For Missing Values
3	Researches On The Classification Of Imbalanced Data With Missing Values
4	Modeling Of Incomplete Data And Missing Values Imputations Based On Alternate Learning
5	Research On Missing Value Imputation Method Based On Mixed Information System
6	Research On Imputing Algorithm Of Missing Values Based On Kernel Similarity And Low Rank Approximation
7	Multiple Imputation on Missing Values in Time Series Data
8	Study On Fuzzy Clustering For Incomplete Data Based On Probability Model Of Missing Attribute Values
9	The Research On Imputation Algorithm Of Missing Values For Gene Expression Data
10	Imputation of missing values by integrating artificial neural networks and case-based reasoning