Font Size: a A A

Research On Feature Selection Algorithm Based On Rough Set Theory

Posted on:2008-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:L M XuFull Text:PDF
GTID:2178360212468179Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining is a new technology which developed rapidly from 1990s. It deals with lots of daily operation data with the objective to extract potential useful but unknown information from the raw data. Machine learning provides theoretical basic for data mining, including extracting information from original database and expressing it in comprehensible way. Machine learning algorithm always has strict requirement of data set such as good integrity, little redundancy, weak correlation and so on. However, the daily data is always not satisfactory like that. Currently data preprocessing is the effective way to reduce redundant or incomplete data before executing the algorithm.Feature selection is an important component of data preprocessing. A good feature selection algorithm can reduce noise and dimension of data sets so as to make a good effect to machine learning algorithm. Nowadays, feature selection has become a hot research topic and there are already some well-established algorithms have been proposed. Rough set theory is a mathematical tool which characterizing imprecise, uncertainty and all kinds of incomplete information. It has been widely used in machine learning, knowledge discovery, decision support and other areas. The essence of rough set theory is data reduction which can be used for feature selection has already been applied to some algorithms successfully.This thesis first introduces feature selection related theory including feature evaluation and search methods. Secondly, the basic concepts of rough set theory especially the data reduction and getting reduction with discernibility matrix have been described. Thirdly, basing on the content of feature selection related in Weka which is an open source data mining tool, we propose a new feature selection algorithm based on rough set theory, which viewed the set of core features as the initial set, and define a symmetrical uncertainty measurement method as features correlation evaluating criterion. The algorithm considers relationship not only between attributes but also between attribute and class. At last, Naive Bayes and C4.5 were used to evaluate the result of feature selection in the experiment then carried out the conclusion. Experimental results have shown that this algorithm had better performance on the data with non-empty feature core sets than the previous algorithms.
Keywords/Search Tags:Data mining, Feature selection, Rough set, Weka, Feature evaluation
PDF Full Text Request
Related items