Font Size: a A A

Feature Selection Approaches Based On Weighted Kernel Density Estimation

Posted on:2022-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306731953289Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of data dimensions and the number of samples,a large amount of redundant or irrelevant information will be generated.This information will reduce the generation ability of the model and lead to overfitting.Feature selection,as a significant preprocessing procedure of data mining,can choose the most significant features to describe the raw data from high-dimensional data.In particular,when disposing of the continuous data,feature selection approaches based on kernel density estimation entropy can avoid discretization,get more accurate values of information measure and assess the significance of features more effectively.However,the existing feature selection methods based on kernel density estimation ignore the differences of different samples on kernel density estimation.Meanwhile,they can't effectively evaluate the relevance and redundancy of features.Aiming at the above issues,this paper designs two weighted kernel density estimation models and their corresponding feature selection methods.The main work of this paper is described as follows:· Considering the negative impact of label noise on kernel density estima-tion,a weight formula is designed from the global sample space.Then,a global weighted kernel density estimation model is constructed through the combination of the weight and kernel density estimation,and the rel-evant properties are studied.Next,the entropy structure of the global weighted kernel density estimation is defined via the above model,and the roles of mutual information and multi-information in feature evaluation are theoretically analyzed.Subsequently,the evaluation criterion of Max-Relevance and Min-Class-relevant Redundancy and its feature selection approach are raised(MRMCR).Finally,experimental results indicate the feature selection approach is not only robust,but also effective.· Aiming at the problem of unbalanced class distribution in data,a class weight is constructed from the class space.Then,a class-weighted kernel density estimation model is designed,and the related theoretical properties are discussed in the model.Next,the entropy structure of the class-weighted kernel density estimation is defined,and the concepts of independent classification information rate and redundant classification information rate are proposed through the entropy structure.The impact of the two information rates on evaluating decision irrelevant information,independent classification and redundant classification information is analyzed through Venn diagrams detailedly.Subsequently,a feature selection approach via Maximizing Independent and Minimizing Redundant Classification Information Ratio is defined(MIMRCIR).In the end,the experimental results show that the approach is robust and effective.
Keywords/Search Tags:Feature selection, Entropy structure, Global weighted kernel density estimation, Class weighted kernel density estimation
PDF Full Text Request
Related items