Research On Unsupervised Feature Selection Based On Hilbert-schmidt Independence Criterion

Posted on:2024-05-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Hu

Full Text:PDF

GTID:2568307100966149

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology and the wide application of high-tech products,the information data obtained by many industries are characterized by high dimension and high complexity in terms of quantity and content.These data contain not only rich and useful information features,but also a large number of irrelevant and redundant features and noise,which makes it difficult to guarantee the performance of the data analysis model,and the generalization ability is continuously reduced,which brings great pressure on the analysis and visualization of data.Therefore,it is necessary to eliminate irrelevant and redundant features in the data.Feature selection is the basic preprocessing before performing the actual learning task.It reduces the dimension of data by eliminating irrelevant or redundant features in order to reduce the time of model training and improve the accuracy of the model.On the other hand,most of the information data obtained in reality are unlabeled data.It is expensive and impractical to manually label a large amount of high-dimensional data in the later stage.When the data sample lacks the category label,the supervised feature selection methods are no longer applicable.Therefore,it is necessary to study unsupervised feature selection methods for unlabeled data.In recent years,due to the effectiveness and low computational complexity of the Hilbert-Schmidt independence criterion(HSIC),it has been proved that feature selection problems can be solved by HSIC.However,most HSIC-based feature selection methods are subject to the following limitations.First of all,these methods are usually only applicable to labeled data,which is not desirable because most data in real applications are unlabeled.Secondly,the existing unsupervised feature selection methods based on HSIC only solve the general correlation between the selected features and the output value expressing the basic clustering structure,while ignoring the redundancy between different features.To address these two problems,the specific work of this paper is as follows:1.The relevant basic knowledge of feature selection and HSIC are introduced in detail.At the same time,the typical methods of feature selection based on HISC are systematically reviewed,and their advantages and disadvantages are analyzed.2.A novel unsupervised feature selection algorithm named UFSHSIC is proposed.The algorithm first defines all input data as labels to better explore the correlation between the feature subset and the overall sample structure.Then HSIC is used as a correlation criterion to measure the correlation between features and the overall sample structure and the redundancy between features.Finally,the backward elimination strategy is used to select features.Experiments on seven UCI data sets verified the effectiveness of the proposd UFSHSIC algorithm.3.An unsupervised feature selection algorithm based on HSIC Lasso named UHSIC-Lasso is proposed.The algorithm introduces a nonlinear Lasso regression model,i.e.,HSIC Lasso,to transform the feature selection problem into a continuous optimization problem.Firstly,the algorithm uses a set of kernel functions to select the non-redundant features with the largest amount of information.Secondly,it introduces the robustness and sparsity of the 1norm constraint to improve the algorithm.Finally,it effectively calculates the global optimal solution by solving a Lasso optimization problem.In addition,the algorithm has a clear statistical interpretation,that is,it can find the minimum redundancy feature that has the greatest dependence on the output value according to the HSIC.The effectiveness of UHSIC-Lasso algorithm is verified by simulation experiments on UCI data sets.

Keywords/Search Tags:

Feature selection, Hilbert-Schmidt independence criterion (HSIC), Kernel method, Feature redundancy, Machine learning

PDF Full Text Request

Related items

1	Research On Multi-label Feature Selection Algorithms Based On Normalized Cross-Covariance Operator
2	Research On Feature Optimization Method For Multi-Label Decision System
3	Feature Selection Method Research For Multi-label Classification
4	Research On Semi-Supervised Multi-Label Feature Selection Algorithm
5	Research On Label Coding Algorithms For Multi-label Classification
6	Research On Causal Inference Method Based On Schmidt Orthogonal Matrix Verification
7	Research And Application On Multi-view Clustering Algorithm Based On Subspace Learning
8	Research On Two-Stage Feature Selection Methods In Machine Learning
9	Variables Selection For Multivariate Time Series Based On Causality Analysis
10	Research On Multi-label Feature Selection Based On HSIC And Swarm Ntelligience Algorithms