Font Size: a A A

Low-rank Sparsity Feature Reduction And Its Applications In Data Mining

Posted on:2018-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:R Y HuFull Text:PDF
GTID:2348330518456557Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The study of data mining often uses high-dimensional data to interpret the characteristic of the data.Due to all kinds of reasons,such as the diversity of the data,high-dimensional data usually contain irrelevant and redundant features,which generally increase storage space and computation cost,and thus easily lead to the issue of "curse of dimensionality" to decrease the efficiency of data mining.Dimensionality reduction reduces the dimensions of the features to select a subset of important features from all the features,and has been shown to solve above issues of high-dimensional data and improve the performance of classifiers as well.Thus dimensionality reduction has been widely used in real applications.Previous dimensionality reduction methods include feature selection and subspace learning.Feature selection preserves original structures of the data to select relevant features from all the features,while subspace learning maps high-dimensional data to their low-dimensional feature space via preserving all kinds of structures of the data to remove the impact of outliers and irrelevant features.In a word,the results of feature selection is interpretable,while subspace learning is more stable than feature selection.This thesis combines subspace learning with feature selection in a framework to solve the problems that high-dimensional data may increase the rank of the data matrix.To do this,this thesis uses a low rank constraint and sparse learning to select a subset of features from high-dimensional data,and then applies the resulting low-dimensional data in classification tasks and regression tasks on single-view data and multi-view data.The main contents and their corresponding contributions of this thesis are listed as follows:Based on the observations that self-expressive methods have significant classification performance,this thesis uses the techniques of feature-level self-expressive,low rank constraint,and sparse learning,to propose a novel unsupervised dimensionality reduction method,namely RS FS,which converts the problem of unsupervised dimensionality reduction to a problem of supervised dimensionality reduction.Specifically,with the assumption that unlabeled data have latent labels,RS_FS first employs k-means clustering to obtain the pseudo labels of the data,and then uses the feature-level self-expressive to consider the self-similarity among the features and sparse regularization to generate sparse coefficient matrix.In the resulting objective function,subspace learning preserves the global structures of the data to keep the important features with non-zero coefficients,while the low rank constraint generates the order of the features.Compared to the comparison methods on public datasets,the proposed RS_FS achieves the best classification performance.The diversity of the data leads to multi-view representation,so this thesis proposes a new dimensionality reduction method(SLR_FS for short)for dealing with multi-view data.SLR_FS applies sparse reconstruction to yield a coefficient matrix for each view of the data,employs sparse learning to remove the adverse effect of noisy samples and redundancy features,and designs a low rank constraint to preserve the global structures of the data in each view.Meanwhile,SLR_FS conducts a linear regression on all the resulting coefficient matrices and subspace learning to further adjust the coefficient matrices.Compared to the comparison methods,the proposed SLR_FS achieves the best performance in terms of all evaluation metrics.This thesis proposes different dimensionality reduction methods for either single-view data or multi-view data.More specifically,this thesis considers the techniques of low rank and sparse learning,to propose new dimensionality reduction methods for single-view data and multi-view data,aim at selecting a subset of representative features from all the features.To make fair verification between the proposed methods and the comparison methods,this thesis uses the same experimental settings for all the methods to conduct classification tasks and regression tasks.Moreover,this thesis also uses three kinds of evaluation metrics for each task.In all kinds of comparisons,the proposed methods achieve the best performance.In the future work,I will focus on designing new deep learning techniques to improve the above frameworks of dimensionality reduction.
Keywords/Search Tags:Data type analysis, Low-rank representation, Feature selection, Sparse reconstruction technology, Subspace learning
PDF Full Text Request
Related items