Font Size: a A A

Research On Data Mining Algorithm Based On Low-rank Sparse Subspace

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:L F YangFull Text:PDF
GTID:2348330518456590Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-dimensional data have high dimensional features,as well include lots of redundancy noise,and outliers,which make the space structure of data become more complex,so it is hard to take advantage of the intrinsic relational structures of data to construct the data mining model.While building the affinity matrix is a significant procedure to search the relational structures inhered in data,i.e.,capturing the correlation between any two samples or features.However,its learning is sensitive to the noise and outliers.Sparse learning can penalize the affinity matrix to have a sparse structure,i.e.,the high correlation of two samples or features always own a bigger value,while the low correlation of them is a smaller or zero value.Therefore,the sparse affinity matrix can effectively reflect the correlation of samples or features,and make the model obtain a good robustness to eliminate the interference of noise and outliers.Otherwise,the noise and outliers inhered in data always make the rank of affinity matrix bigger,and result in the model cannot catch the truth low rank structure of data.Thus,it is significant to leverage the low rank constraint to explicitly determine the rank of affinity matrix when learning the affinity matrix.However,there exist some drawbacks in current data mining algorithms.Firstly,most of them only consider parts of the relational structures inhered in data,e.g.,the model just utilizes global structure information or local structure information,few of them use full structure information to build the model but do not combine the sparse learning,low-rank constraint,and subspace learning in a unified framework to grasp the complementary structure information;secondly,current data mining algorithms always complete the specific task by conducting multiple separate independence processes,although they get their best individual solution,they cannot ensure the algorithm obtain the best global solution.To this end,this paper mainly research on sparse learning,low-rank constraint,and subspace learning to resolve the drawbacks of current algorithms,and propose the novelty multi-output regression algorithm and the new subspace clustering algorithm to effectively deal with the data mining for high dimensional data.The major research achievements of this paper can be summarized as follows:1)Propose a new multi-output regression algorithm based on low-rank constraint and feature selection called Low-rank Feature Reduction for multi-output regression,short for LFR,to resolve the current multi-output regression algorithms do not fully utilize the multiple relational structures inhered in high dimensional data.Specifically,the proposed novelty LFR algorithm utilizes the l2,1-norm regularization to sparse the regression coefficient matrix,so that it can search the correlation of features and conduct feature selection to eliminate the interference of noise and remove the redundancy of feature;additionally,LFR algorithm represents the regression coefficient matrix by the product term of two new owned low rank constraint matrixes,so that it can grasp the correlation of output variables;furthermore,LFR combines l2,1-norm with the loss function to conduct sample selection,so that it can remove the interference of outliers to improve the regression model by consider the correlation of samples.Experiment on a mass of multi-output regression datasets,and the results showed the proposed LFR algorithm can obtain a good predict capability for multi-output regression of high dimensional data.2)Proposed a novelty Low-rank Sparse Subspace clustering algorithm,short for LSS,based on low-rank constraint,sparse learning,and subspace learning.The current subspace clustering algorithms always complete the clustering task by conducting two separate processes,i.e.,firstly constructing the similarity matrix and then conducting the spectral clustering,which cannot ensure the model obtain the final best global solution.Therefore,this paper proposed the LSS algorithm to solve it in the fourth chapter.Specifically,LSS algorithm conducts the feature selection to remove the noise and redundancy by utilizing the sparse learning to sparse the coefficient matrix;and LSS algorithm learns the similarity matrix respectively from original data space and the low dimensional subspace of original data,and iterative optimize these two similarity matrixes to reflect the truth similarity of the data;furthermore,LSS algorithm simultaneously obtains the ideal similarity matrix and final best clustering result by imposing the low-rank constraint on the Laplacian matrix of the similarity matrix in the iterative optimization process.Extensive clustering experimental results verified the proposed LSS algorithm can have a good performance of clustering for high dimensional data.This paper mainly research on utilizing the sparse learning,low-rank constraint,and subspace learning in an unified framework to address some drawbacks existing in current data mining algorithms,and propose a new multi-output regression algorithm and a new subspace clustering algorithm.After that,this paper verified these two algorithms by a mass of experiments on real datasets in terms of various evaluation criteria.
Keywords/Search Tags:Data Mining, subspace clustering, multi-output regression, sparse learning, low-rank constraint, subspace learning, feature selection
PDF Full Text Request
Related items