Using domain knowledge to determine record usability in knowledge discovery

Posted on:2001-12-04

Degree:Ph.D

Type:Dissertation

University:Mississippi State University

Candidate:Wright, Margaret Brasfeild

Full Text:PDF

GTID:1468390014456639

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

In recent years, researchers and practitioners alike have established the power of knowledge discovery in databases (KDD). This powerful process has been applied to many domains including identifying space objects (Fayyad, Djorgovski, and Wei), targeting customer types with high risk or cross-sale potential (Haimowitz, Gur-Ali, and Schwarz 1997), and identifying faulty network circuits (Sasisekharan, Seshadri, and Weiss 1996). Success in these and many other diverse areas has spawned a tremendous interest in applying KDD to many other domains.; Prior to the actual process of mining data for knowledge, data preprocessing must occur to pare the data to a potentially useful and usable set of data. The nature of data preprocessing is critical as paring too much data can restrict and hamper the discovery of useful knowledge. Conversely, mining with redundant, missing, or inaccurate data can produce false or misleading results.; Several methods have been used to resolve the problem of missing data attributes. Typical methods of determining a record's usability are based on: (1) the percentage of data attributes missing; (2) assuming all records are useful and ignoring missing values; or (3) assuming all records are useful and inferring missing values (without considering individual attribute importance). This research introduces a model to determine data mining usability of individual records based on a combination of information about which attributes are missing and the relative importance of the various attributes as defined by the domain expert.; Knowledge-based instance selection (KbIS) is a model for determining record usability that incorporates domain knowledge about data attributes and is based on a fuzzy aggregation technique. Using KbIS for instance selection when there are numerous missing data values results in datasets which produce more accurate discovery results than datasets produced by using a proscribed percentage of missing values (Famili 1997). Likewise, KbIS outperforms the default selection methods of discarding all records with missing values or using all records with missing values.

Keywords/Search Tags:

Using, Data, Missing values, Discovery, Usability, Records, Domain

PDF Full Text Request

Related items

1	Researches On Imputation And Classification Of Incomplete Data Based On Variables For Missing Values
2	Researches On The Classification Of Imbalanced Data With Missing Values
3	Modeling Of Incomplete Data And Missing Values Imputations Based On Alternate Learning
4	Research On Key Technologies Of Data Cleaning Based On Crowdsourcing
5	Research On Missing Value Imputation Method Based On Mixed Information System
6	Research On Imputing Algorithm Of Missing Values Based On Kernel Similarity And Low Rank Approximation
7	Multiple Imputation on Missing Values in Time Series Data
8	The Research On Imputation Algorithm Of Missing Values For Gene Expression Data
9	Bayes nets: A generalized variable elimination algorithm and applications to classification
10	Imputation of missing values by integrating artificial neural networks and case-based reasoning