The Study Of The Data Imputation And Attributes Reduction Methods In Information Systems

Posted on:2015-03-12

Degree:Master

Type:Thesis

Country:China

Candidate:T S Yang

Full Text:PDF

GTID:2308330470462002

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Attribute reduction, which is one of the core problems of rough set theory, plays a key role in information and analysis processing. This paper studies data imputation algorithms and attribute reduction algorithms based on mutiple theories, e.g., rough set theory, information theory, machine learning theory and gray system theory. The main contents are as follows:(1) Pawlak rough set model requires the process of incomplete dataâ€™s completion when deal with incomplete information systems. Missing data imputation has proven to be an effective and practice strategy. The paper proposes a novel feature weighted grey KNN imputation algorithm to address the problem that the classical grey k nearest neighbour imputation ignores the feature relevance. According to information theory, we use mutual information(MI) as measurement of relations between random features and establishing weight matrix of feature relevance. And then the approach measures the relationship between instances based on grey relational grade and imputes missing data based on the concept of weighted nearest neighbour. We present an experimental evaluation on UCI datasets, and the results demonstrate that our method is superior to the other four estimation strategies in terms of predict accuracy. Moreover, when our approach is applied to missing data imputation, the classification bias can be reduced in classification tasks.(2) The classical rough set attribute reduction algorithm based on equivalence relations, can only deal with discrete data, but real data is often hybrid data including discrete and continuous variable. Therefore, fuzzy-rough set is presented and successfully applied to deal with hybrid data. To solve the problem that attribute reduction algorithms based on fuzzy-rough set canâ€™t delete attributes in some cases, the paper introduces Î± information entropy for measuring fuzzy similarity relation, and proposed a new attribute significance measure based on Î± information entropy. With the measure as heuristic information, hybrid attribute reduction algorithm is presented. Through adjusting the information entropy parameter, we can obtain multiple uncertainty measures and optimum attribute reduction set. Based on many experiments, optimum interval of parameter Î± is given and is range from 0.9 to 1.8. We also present an experimental evaluation with other popular attribute reduction algorithms on UCI datasets. Experimental demonstrate that our method is superior to the other attribute reduction methods in classification accuracy as well as the number of reduct attributes.

Keywords/Search Tags:

information systems, data imputation, attributes reduction, information entropy, rough set

PDF Full Text Request

Related items

1	Research Of Knowledge Reduction Algorithm Based On The Relativity Of Attributes In Information Systems
2	Research On The Method Of Intelligent Data Analysis Based On Rough Set And Concept Lattice
3	Research And Application Of Data Reduction Algorithms Based On Rough Entropy
4	Data Reduction Algorithm Based On Information Entropy
5	Rough Set Extended Model And Attributes Reduction In Incomplete Information Systems
6	Incomplete Information Systems, Rough Set Attribute Reduction Evolutionary Algorithm And Applied Research
7	Study On Attribute Reduction Criteria And Information Loss Of Attribute Reduction Based On Rough Sets
8	Research Of Reduction Algorithms Based On Rough Set Theory
9	The Study Of Knowledge Discovery And Attributes Reduction In Set-valued Information Systems
10	Attribute Reduction Of Rough Set Based On Conditional Information Entropy In General Binary Relation