Font Size: a A A

Semi-supervised Feature Selection Based On Kernel Density Estimation

Posted on:2019-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Q XuFull Text:PDF
GTID:2428330626952110Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In big data era,the rapid growth of data and the diversity of data bring challenges to machine learning and data mining tasks.Feature selection is an important part of feature engineering,whose purpose is to select a subset of features related to the task and eliminate redundant features.On one hand,feature selection reduces computational cost and improves the accuracy of model.On the other hand,the simplified feature model has better interpretability.In many applications,getting one piece of data is easy,but getting a piece of data with complete label is relatively difficult.Therefore,training data for many applications typically consists of a small amount of labeled data and a large amount of unlabeled data.Thus,we need to focus on semi-supervised feature selection issue.Kernel density estimation is a non-parametric density estimation method that does not require prior assumptions about data distribution.The density value of a sample is the average effect affected by all other samples.At present,kernel density estimation is widely used in image/video annotation,signal processing,network fault detection and other fields.In this paper,kernel density estimation is introduced to feature selection based on information theory and semi-supervised feature selection based on sparse model respectively,and two corresponding semi-supervised feature selection methods based on kernel density estimation are proposed.(1)Semi-supervised feature selection method based on kernel density estimation entropy.For traditional feature selection methods based on information theory,kernel density estimation avoids discretization for continuous data,thus avoiding information loss caused by discretization.In this method,we extend feature selection method based on kernel density estimation entropy to make it suitable for partial labeled data.The method adopts a heuristic method for feature selection.Mutual information is used as evaluation metric.Kernel function in kernel density estimation is used as the distance metric.Utilizing the relationship between labeled data and unlabeled data,the probability that each unlabeled data belongs to each label class is calculated according to the principle that the weight is larger according to the closer distance.Then,the kernel density estimation entropy is extended from supervised scene to semi-supervised scene.We demonstrate the effectiveness of the method through classification experiments and multi-label learning experiments.(2)Semi-supervised feature selection based on sparse model and kernel density estimation.Semi-supervised feature selection based on sparse and graph models utilizes sparse models for feature selection,and utilizes graph-based semi-supervised learning to learn the label probability distribution of samples.Semi-supervised kernel density estimation is a semi-supervised learning method.It extends the posterior probability based on kernel density estimation through Bayes' theorem,so that unlabeled data is unified with labeled data.An iterative solution is used to slove the posterior probability of data.We introduce the idea of semi-supervised kernel density estimation to semi-supervised feature selection based on sparse model.Experiments show that the proposed method has good performance compared to sparse and graph models-based semi-supervised features selection.
Keywords/Search Tags:semi-supervised feature selection, semi-supervised learning, kernel density estimation, mutual information, sparse model
PDF Full Text Request
Related items