Low-rank Sparsity Feature Reduction And Its Applications In Data Mining

Posted on:2018-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:R Y Hu

Full Text:PDF

GTID:2348330518456557

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The study of data mining often uses high-dimensional data to interpret the characteristic of the data.Due to all kinds of reasons,such as the diversity of the data,high-dimensional data usually contain irrelevant and redundant features,which generally increase storage space and computation cost,and thus easily lead to the issue of "curse of dimensionality" to decrease the efficiency of data mining.Dimensionality reduction reduces the dimensions of the features to select a subset of important features from all the features,and has been shown to solve above issues of high-dimensional data and improve the performance of classifiers as well.Thus dimensionality reduction has been widely used in real applications.Previous dimensionality reduction methods include feature selection and subspace learning.Feature selection preserves original structures of the data to select relevant features from all the features,while subspace learning maps high-dimensional data to their low-dimensional feature space via preserving all kinds of structures of the data to remove the impact of outliers and irrelevant features.In a word,the results of feature selection is interpretable,while subspace learning is more stable than feature selection.This thesis combines subspace learning with feature selection in a framework to solve the problems that high-dimensional data may increase the rank of the data matrix.To do this,this thesis uses a low rank constraint and sparse learning to select a subset of features from high-dimensional data,and then applies the resulting low-dimensional data in classification tasks and regression tasks on single-view data and multi-view data.The main contents and their corresponding contributions of this thesis are listed as follows:Based on the observations that self-expressive methods have significant classification performance,this thesis uses the techniques of feature-level self-expressive,low rank constraint,and sparse learning,to propose a novel unsupervised dimensionality reduction method,namely RS FS,which converts the problem of unsupervised dimensionality reduction to a problem of supervised dimensionality reduction.Specifically,with the assumption that unlabeled data have latent labels,RS_FS first employs k-means clustering to obtain the pseudo labels of the data,and then uses the feature-level self-expressive to consider the self-similarity among the features and sparse regularization to generate sparse coefficient matrix.In the resulting objective function,subspace learning preserves the global structures of the data to keep the important features with non-zero coefficients,while the low rank constraint generates the order of the features.Compared to the comparison methods on public datasets,the proposed RS_FS achieves the best classification performance.The diversity of the data leads to multi-view representation,so this thesis proposes a new dimensionality reduction method(SLR_FS for short)for dealing with multi-view data.SLR_FS applies sparse reconstruction to yield a coefficient matrix for each view of the data,employs sparse learning to remove the adverse effect of noisy samples and redundancy features,and designs a low rank constraint to preserve the global structures of the data in each view.Meanwhile,SLR_FS conducts a linear regression on all the resulting coefficient matrices and subspace learning to further adjust the coefficient matrices.Compared to the comparison methods,the proposed SLR_FS achieves the best performance in terms of all evaluation metrics.This thesis proposes different dimensionality reduction methods for either single-view data or multi-view data.More specifically,this thesis considers the techniques of low rank and sparse learning,to propose new dimensionality reduction methods for single-view data and multi-view data,aim at selecting a subset of representative features from all the features.To make fair verification between the proposed methods and the comparison methods,this thesis uses the same experimental settings for all the methods to conduct classification tasks and regression tasks.Moreover,this thesis also uses three kinds of evaluation metrics for each task.In all kinds of comparisons,the proposed methods achieve the best performance.In the future work,I will focus on designing new deep learning techniques to improve the above frameworks of dimensionality reduction.

Keywords/Search Tags:

Data type analysis, Low-rank representation, Feature selection, Sparse reconstruction technology, Subspace learning

PDF Full Text Request

Related items

1	Low-Rank Feature Selection Algorithm Based On Sparse Learning And Hypergraph
2	Research On Data Mining Algorithm Based On Low-rank Sparse Subspace
3	Sparse And Low-rank Theory Based Feature Selection Algorithms
4	Low Rank Feature Selection Algorithms Via Feature Self-representation
5	Sequential Subspace Clustering Based On Sparse And Low-rank Representation
6	Research On Model Learning Based On Sparse And Low-rank Constraints
7	Principal Component Analysis Methods Based On Sparse And Low-rank Constraints
8	Subspace Learning And Non-convex Sparse Regression Based Feature Selection
9	Study On Face Recognition Algorithms Based On Nonlinear Subspace Learning And Discriminant Analysis
10	Hypergraph Low-rank Feature Selection Algorithms With Application