Semi-supervised Feature Selection Based On Kernel Density Estimation

Posted on:2019-12-13

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Xu

Full Text:PDF

GTID:2428330626952110

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In big data era,the rapid growth of data and the diversity of data bring challenges to machine learning and data mining tasks.Feature selection is an important part of feature engineering,whose purpose is to select a subset of features related to the task and eliminate redundant features.On one hand,feature selection reduces computational cost and improves the accuracy of model.On the other hand,the simplified feature model has better interpretability.In many applications,getting one piece of data is easy,but getting a piece of data with complete label is relatively difficult.Therefore,training data for many applications typically consists of a small amount of labeled data and a large amount of unlabeled data.Thus,we need to focus on semi-supervised feature selection issue.Kernel density estimation is a non-parametric density estimation method that does not require prior assumptions about data distribution.The density value of a sample is the average effect affected by all other samples.At present,kernel density estimation is widely used in image/video annotation,signal processing,network fault detection and other fields.In this paper,kernel density estimation is introduced to feature selection based on information theory and semi-supervised feature selection based on sparse model respectively,and two corresponding semi-supervised feature selection methods based on kernel density estimation are proposed.(1)Semi-supervised feature selection method based on kernel density estimation entropy.For traditional feature selection methods based on information theory,kernel density estimation avoids discretization for continuous data,thus avoiding information loss caused by discretization.In this method,we extend feature selection method based on kernel density estimation entropy to make it suitable for partial labeled data.The method adopts a heuristic method for feature selection.Mutual information is used as evaluation metric.Kernel function in kernel density estimation is used as the distance metric.Utilizing the relationship between labeled data and unlabeled data,the probability that each unlabeled data belongs to each label class is calculated according to the principle that the weight is larger according to the closer distance.Then,the kernel density estimation entropy is extended from supervised scene to semi-supervised scene.We demonstrate the effectiveness of the method through classification experiments and multi-label learning experiments.(2)Semi-supervised feature selection based on sparse model and kernel density estimation.Semi-supervised feature selection based on sparse and graph models utilizes sparse models for feature selection,and utilizes graph-based semi-supervised learning to learn the label probability distribution of samples.Semi-supervised kernel density estimation is a semi-supervised learning method.It extends the posterior probability based on kernel density estimation through Bayes' theorem,so that unlabeled data is unified with labeled data.An iterative solution is used to slove the posterior probability of data.We introduce the idea of semi-supervised kernel density estimation to semi-supervised feature selection based on sparse model.Experiments show that the proposed method has good performance compared to sparse and graph models-based semi-supervised features selection.

Keywords/Search Tags:

semi-supervised feature selection, semi-supervised learning, kernel density estimation, mutual information, sparse model

PDF Full Text Request

Related items

1	Research On Semi-supervised Sparse Feature Selection For Image Annotation In Web Space
2	Research On Semi-supervised Clustering And Classification Algorithm
3	The Research Of Facial Feature Extraction Method Based On Semi-supervised Learning
4	Application Of Semi-Supervised Learning Algorithm Based On Kernel Density In Video Semantic Annotation
5	Research Of Reliable Semi-supervised Classification
6	Research On Multi-view Adaptive Semi-supervised Feature Selection Algorithm
7	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning
8	Research On The Application Of Geometric Information In The Semi-supervised Learning
9	Research On Semi-Supervised Feature Selection Algorithms
10	Research On Semi-supervised Feature Sparse Selection Method Algorithm Based On Least Squares Regression