Font Size: a A A

Research On Improved Feature Selection Based On Partition Differentiation Entropy

Posted on:2019-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q SunFull Text:PDF
GTID:2348330542972029Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature selection is an effective technique for dimensionality reduction of data.The main task is to select the most representative subset of features from the original dataset.After the dimensionality reduction,the dataset can still express the meaning of the original information and predict the unknown data.In the current feature selection algorithms,the rough-set feature selection model is mainly dealing with discrete data sets,can not directly deal with continuous data,and most real-life datasets are continuous.Some scholars have proposed combining fuzzy sets with rough sets to form fuzzy rough sets to make feature selection.Common fuzzy-rough set feature selection algorithm based on the positive region(dependency)and based on the information entropy,but these algorithms in the processing time-consuming.In the rough-set feature selection model,the partition differentiation entropy model uses the idea of partition to divide the original information system into multiple sub-information systems.By computing the partition differentiation entropy on the sub-information system,instead of calculating directly on the entire set of condition attributes.The entropy method shortens the time complexity and achieves the same classification ability as the traditional information entropy model.However,due to the influence of rough sets,the partition differentiation entropy model can only deal with discrete datasets.Based on the characteristics of partition differentiation entropy in the rough-set feature selection model,this paper proposes a fuzzy-rough feature selection algorithm based on the Lambda partition differentiation entropy(LDE-FRFS).Firstly,the original decision system is divided into several sub-systems.In each sub-system,the local importance of the attributes is evaluated.Then,the global importance of the attributes is calculated using the degree of local importance,and the reduction result is obtained.Compared with the other entropy method in fuzzy-rough set,LDE-FRFS algorithm has the lowest time complexity with the same premise and even increasing The algorithm can deal with the real-valued data directly compared with the partition differentiation entropy.At the same time,this paper finds that the feature selection algorithm of partitioned difference entropy is greatly affected by noise data.An algorithm to reduce the impact of noise data,different classes' ratio algorithm,is added to the LDE-FRFS algorithm.The new algorithm of Lambda partition differentiation entropy based on different classes' ratio improves the classification accuracy of the algorithm.In this paper,two algorithms are applied to nine benchmark datasets for feature selection.Compared with the four algorithms(the Lambda conditional entropy feature selection algorithm,the fuzzy-rough quick reduct algorithm,the fuzzy entropy based feature selection algorithm and the principal component analysis algorithm),the experimental results show that the performance of the algorithm based on this paper are superior to other traditional feature selection algorithms.
Keywords/Search Tags:Feature Selection, Rough Set, Fuzzy-rough Set, Partition Differentiation Entropy, Different Classes' Ratio Algorithm
PDF Full Text Request
Related items