Font Size: a A A

Research On Feature Selection Algorithm Based On Rough Sets

Posted on:2014-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:C W LiFull Text:PDF
GTID:2268330401462537Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Rough set theory proposed by Poland mathematician Z. Pawlak is a soft computing tool for dealing with fuzzy and uncertainty data and is one of hot spots in artificial intelligence field. For its unique and innovative thinking, rough set theory has attracted much attention in recently30years. Many researchers successfully developed several generalized rough set models such as fuzzy rough set, dominance rough set, decision theory rough set and variable precision rough set. These models has been successfully used in widely field such as machine learning, pattern recognition, decision support, process control, knowledge discovery in database, expert system etc.Feature selection based on rough set, also called attribute reduction, is a key concept in rough set. It aims to retain the discernible ability of original features for the objects from the universe. When constructing predictive models, by removing redundant features, feature selection can improve model interpretability and enhance generalization. With the emergence large-scale data sets and high dimensions, the idea of feature selection is very significant for solving big data with high-value and low-value density.In this paper, existing efficient attribute reduction algorithms are analyzed. By selecting useful features from an ordered feature sequence, a new attribute reduction algorithm based on PageRank is proposed. In addition, a class library (RSLibrary) for rough set and preprocessing data is constructed. And a rough data analysis system is designed on the basis of RSLibrary. Main works of this paper is listed as follows:(1) Heuristic attribute reduction algorithms are analyzed and compared. The classical heuristic attribute reduction algorithm, accelerated reduction algorithms, two attribute reduction accelerated algorithms are specifically analyzed and compared.(2) A "global" attribute importance attribute reduction algorithm is proposed. By combining rough set theory and PageRank, this paper proposes the attribute sorting algorithm (AttributeRank), and then designs the attribute reduction algorithm based on attribute rank. By employing the parallel version and distributed systems, the new algorithm can get an ordered feature set efficiently.(3) A rough set data analysis platform is designed. A class library including attribute reduction algorithms and preprocessing data techniques is constructed. And a rough set data analysis platform is developed on the basis of RSLibraryAn overview of the main content and the direction of further research are given in the final part of the paper. The parallel version of attribute reduction provides evaluable ways for dealing with big data, new lessons for exploring efficient data mining techniques and promotes the development in the area of artificial intelligence.
Keywords/Search Tags:Rough Sets, Feature Selection, AttributeRank, RSLibrary, Dissimilarity Coefficient of Attribute
PDF Full Text Request
Related items