Font Size: a A A

The Study Of The Data Imputation And Attributes Reduction Methods In Information Systems

Posted on:2015-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:T S YangFull Text:PDF
GTID:2308330470462002Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Attribute reduction, which is one of the core problems of rough set theory, plays a key role in information and analysis processing. This paper studies data imputation algorithms and attribute reduction algorithms based on mutiple theories, e.g., rough set theory, information theory, machine learning theory and gray system theory. The main contents are as follows:(1) Pawlak rough set model requires the process of incomplete data’s completion when deal with incomplete information systems. Missing data imputation has proven to be an effective and practice strategy. The paper proposes a novel feature weighted grey KNN imputation algorithm to address the problem that the classical grey k nearest neighbour imputation ignores the feature relevance. According to information theory, we use mutual information(MI) as measurement of relations between random features and establishing weight matrix of feature relevance. And then the approach measures the relationship between instances based on grey relational grade and imputes missing data based on the concept of weighted nearest neighbour. We present an experimental evaluation on UCI datasets, and the results demonstrate that our method is superior to the other four estimation strategies in terms of predict accuracy. Moreover, when our approach is applied to missing data imputation, the classification bias can be reduced in classification tasks.(2) The classical rough set attribute reduction algorithm based on equivalence relations, can only deal with discrete data, but real data is often hybrid data including discrete and continuous variable. Therefore, fuzzy-rough set is presented and successfully applied to deal with hybrid data. To solve the problem that attribute reduction algorithms based on fuzzy-rough set can’t delete attributes in some cases, the paper introduces α information entropy for measuring fuzzy similarity relation, and proposed a new attribute significance measure based on α information entropy. With the measure as heuristic information, hybrid attribute reduction algorithm is presented. Through adjusting the information entropy parameter, we can obtain multiple uncertainty measures and optimum attribute reduction set. Based on many experiments, optimum interval of parameter α is given and is range from 0.9 to 1.8. We also present an experimental evaluation with other popular attribute reduction algorithms on UCI datasets. Experimental demonstrate that our method is superior to the other attribute reduction methods in classification accuracy as well as the number of reduct attributes.
Keywords/Search Tags:information systems, data imputation, attributes reduction, information entropy, rough set
PDF Full Text Request
Related items