Font Size: a A A

The Research Of Feature Selection Algorithms Based On Analysis Of Relevancy And Redundancy

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:N Y XiaoFull Text:PDF
GTID:2248330398950118Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern technology, the ways for getting data are more and more. This phenomenon leads to the explosive growth of data. At the same time, the noise and irrelevant information is increasing. For this reason, data mining is playing a more and more important role. Data mining can dig up some worthy things from the massive amounts of data. It is useful to the analysis and interpretation of data.Feature selection is an important part of data mining, and it is a research hotspot. Feature selection algorithms not only can remove noise and reduce redundancy effectively, but also can improve the classification performance. Genetic algorithm (GA) is a kind of typical wrapper feature selection method. Because of its outstanding solving ability for different problems, GA has attracted a lot of attention. According to the analysis of the relevancy of feature with the classes and the redundancy of feature with feature, a feature selection method based on feature grouping and genetic algorithm (FS-FGGA) is put forward. Firstly, the relevance and the redundancy are analyzed by symmetrical uncertainly. Then the relevant features are partitioned into different groups using approximate markov blanket. Based on the feature groupings, the optimal combined feature subset is obtained by GA at last.The other work of this paper is that dynamic relevance analysis based forward feature selection (DRFFS) method is proposed. It is a hybrid method based on filter and wrapper. DRFFS measures the total relevancy of feature with the classes by score combination of multi-filter algorithms. Then it uses the fusion score and the redundancy of the candidate feature with the selected subset to change the complementary of the candidate feature dynamically. And a forward search strategy based on ranked is applied to choose the best feature subset.Through the combination of feature grouping and genetic algorithm, it can accelerate the speed of solving problem and improve the quality of solution space. The experiment results of eight public datasets show that the classification accuracy of FS-FGGA algorithm is better than SVM-RFE and ECBGS in most cases. Dynamic relevance analysis based forward feature selection method not only can choose the features whose relevancy with the classes is high, but also can reduce the redundancy in the feature subset. Six public datasets are used to demonstrate the superiority of DRFFS. The results show that DRFFS can obtain the best classification accuracy in most cases. And the classification sensitivity and specificity are improved at the same time.
Keywords/Search Tags:Feature Selection, Feature Grouping, Genetic Algorithm, DynamicRelevance
PDF Full Text Request
Related items