Font Size: a A A

The impact of Data Imputation Methodologies on Knowledge Discovery

Posted on:2009-05-09Degree:D.B.AType:Dissertation
University:Cleveland State UniversityCandidate:Brown, Marvin LFull Text:PDF
GTID:1448390002999143Subject:Business Administration
Abstract/Summary:
The purpose of this research is to investigate the impact of Data Imputation Methodologies that are employed when a specific Data Mining algorithm is utilized within a KDD (Knowledge Discovery in Databases) process. This study will employ certain Knowledge Discovery processes that are widely accepted in both the academic and commercial worlds. Several Knowledge Discovery models will be developed utilizing secondary data containing known correct values. Tests will be conducted on the secondary data both before and after storing data instances with known results and then identifying imprecise data values. One of the integral stages in the accomplishment of successful Knowledge Discovery is the Data Mining phase. The actual Data Mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Neural Networks are the most commonly selected tools for Data Mining classification and prediction. Neural Networks employ various types of Transfer Functions when outputting data. The most commonly employed Transfer Function is the s-Sigmoid Function. Various Knowledge Discovery Models from various research and business disciplines were tested using this framework. However, missing and inconsistent data has been pervasive problems in the history of data analysis since the origin of data collection. Due to advancements in the capacities of data storage and the proliferation of computer software, more historical data is being iii collected and analyzed today than ever before. The issue of missing data must be addressed, since ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this research is to address the impact of Missing Data and Data Imputation on the Data Mining phase of Knowledge Discovery when Neural Networks are utilized when employing an s-Sigmoid Transfer function, and are confronted with Missing Data and Data Imputation methodologies.
Keywords/Search Tags:Data imputation methodologies, Knowledge discovery, Impact, Missing data, Data mining, Transfer function
Related items