Font Size: a A A

Mining noisy data: A prediction quality perspective

Posted on:2006-12-04Degree:Ph.DType:Thesis
University:The University of IowaCandidate:Shah, Shital ChamanlalFull Text:PDF
GTID:2458390008970105Subject:Engineering
Abstract/Summary:
Traditional data mining approaches provide useful solutions in service, finance, manufacturing, energy, and medical domains. The focus of this research is on mining noisy, temporal, and high dimensionality data in medical and energy applications. The proposed solution approach produces reliable diagnosis and fault detection results.; Parameter reduction techniques such as weighted decision-trees, genetic algorithms based techniques, and parameter set intersection are applied to a genetic dataset. Significant gene subset was identified (85% reduction in parameters with 10% increased accuracy) in the presence of ill-defined decision values, dimensionality issues, and low density data.; Data transformation and ensemble decision-making are used to predict the survival of kidney dialysis patients. The developed approach handles noisy parameters, increases prediction quality, and provides insight into the role of dialysis related parameters and outcomes. These insights may lead low expectancy patients towards healthier and longer life.; Bladder cancer immunotherapy and Pima Indian diabetes data exhibits noisy instances and inseparable decision boundaries leading to low prediction accuracy. A relabeling algorithm that iteratively retrieves, selects, relabels data instances, and computes confidence index is presented. This algorithm preserves significant medical explanations and provides a subset of the most stable instances (∼10% of population) with high confidence (64% to 95%) for each application. Better prediction estimates facilitates domain experts to make well-informed treatment related decisions.; Predicting temporal evolving patterns with sparse fault data in real-time is of prime importance. A simple, robust, data-driven, modular, alarm system predicting incoming faults is presented. A time lag based decision-making logic successfully identified abnormal states approximately 6 to 16 hours prior to actual faults for two water chemistry systems.; Various applications have shown that high prediction accuracy can be accomplished for noisy, sparse, and temporal datasets in the presence of ill-defined decisions values. The solution approaches discussed in the thesis are applicable across different domains with minor modifications and will lead to customized predictions, treatments, and control strategies improving quality of life and systems reliability.
Keywords/Search Tags:Data, Prediction, Mining, Quality, Noisy
Related items