Font Size: a A A

Kernel Density Estimation Entropy For Hybrid Data And A Fast Greedy Feature Selection Algorithm

Posted on:2018-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2348330512999444Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,feature selection as a key step in data mining,pattern recognition and machine learning,plays a more and more important role in reducing the dimension and improving the speed and accuracy of the algorithm.The concept of entropy and mutual information make the information theory a significant framework for feature selection.It has the advantage of detecting non-linear relationships between variables without prior knowledge,and it is also robust to noise and invariant to data transformation.But the feature selection method based on information theory is mainly to deal with discrete data,but there are a lot of data sets with continuous features and hybrid features in the real world.Common solutions such as discretization is in view of transfor-mation,which cannot directly compute the related probabilities of information theory and might cause original information missing.Kernel density estimation(KDE)in statistics is more direct-ed,as a non-parametric way of estimating the probability density function of a random variable.Inspired by it,some researchers proposed the conditional entropy based on KDE and applied it to the feature selection methods,and proved the effectiveness by experimental results.However,in the existing researches,there are just a few KDE entropy formulas,and they are all focused on the continuous data.What's worst,KDE entropy calculation takes too much time especially in high dimension.These issues result in that the feature selection algorithms based on KDE entropy can not be widely used,since the kinds are too few and the efficiency is too low.To solve the above problems,this paper proposes the hybrid KDE entropy and the correspond-ing fast greedy feature selection algorithm,and then demonstrates its effectiveness and efficiency from theoretical analysis and experimental results.Our major work includes:·Raising a more complete continuous KDE entropy,and proposing the hybrid KDE entropy,unifying the classical discrete entropy and continuous KDE entropy.·Introducing a greedy feature selection algorithm based on hybrid KDE conditional entropy.Analyzing the time complexity and uncovering the effectiveness by experimental results.· Putting forward new concepts of the kernel matrix,data vector,partition matrix and kernel partition matrix,which all have the incremental property.And deriving the hybrid KDE entropy in matrix view,which is equate to the hybrid KDE entropy by definition.· Proposing a fast feature selection method based on hybrid KDE conditional entropy in ma-trix view,which incrementally computes discrete and continuous parts in each round,timely compresses,and accumulates speed advantage of matrix view.Analyzing the time complex-ity and uncovering the high efficiency by experimental results.
Keywords/Search Tags:Kernel Density Estimation, Entropy, Hybrid Data, Feature Selection
PDF Full Text Request
Related items