Font Size: a A A

Using domain knowledge to determine record usability in knowledge discovery

Posted on:2001-12-04Degree:Ph.DType:Dissertation
University:Mississippi State UniversityCandidate:Wright, Margaret BrasfeildFull Text:PDF
GTID:1468390014456639Subject:Computer Science
Abstract/Summary:PDF Full Text Request
In recent years, researchers and practitioners alike have established the power of knowledge discovery in databases (KDD). This powerful process has been applied to many domains including identifying space objects (Fayyad, Djorgovski, and Wei), targeting customer types with high risk or cross-sale potential (Haimowitz, Gur-Ali, and Schwarz 1997), and identifying faulty network circuits (Sasisekharan, Seshadri, and Weiss 1996). Success in these and many other diverse areas has spawned a tremendous interest in applying KDD to many other domains.; Prior to the actual process of mining data for knowledge, data preprocessing must occur to pare the data to a potentially useful and usable set of data. The nature of data preprocessing is critical as paring too much data can restrict and hamper the discovery of useful knowledge. Conversely, mining with redundant, missing, or inaccurate data can produce false or misleading results.; Several methods have been used to resolve the problem of missing data attributes. Typical methods of determining a record's usability are based on: (1) the percentage of data attributes missing; (2) assuming all records are useful and ignoring missing values; or (3) assuming all records are useful and inferring missing values (without considering individual attribute importance). This research introduces a model to determine data mining usability of individual records based on a combination of information about which attributes are missing and the relative importance of the various attributes as defined by the domain expert.; Knowledge-based instance selection (KbIS) is a model for determining record usability that incorporates domain knowledge about data attributes and is based on a fuzzy aggregation technique. Using KbIS for instance selection when there are numerous missing data values results in datasets which produce more accurate discovery results than datasets produced by using a proscribed percentage of missing values (Famili 1997). Likewise, KbIS outperforms the default selection methods of discarding all records with missing values or using all records with missing values.
Keywords/Search Tags:Using, Data, Missing values, Discovery, Usability, Records, Domain
PDF Full Text Request
Related items