Font Size: a A A

Based On The Optimal Points In Sparse Unsupervised Learning Algorithm

Posted on:2011-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuFull Text:PDF
GTID:2190360302974659Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a multivariate statistical analysis method, principal component analysis is widely used in data process and reduction. This method try to diagonalize covariance matrix of original data, expect to find the most important elements and structures, then get rid of noises and redundancy, so the original data can be simplified. However, when try to regression independent variables to the dependent variables in multivariate analysis, the coefficients are all non-zero. So these models lack of interpretation. As the most important information extracted from original data, the PCs in the principal component analysis are the linear combination of all independent variables, also suffers from the same defect. Tibshirani suggests the lasso method in 1996, tries to add l1 norm restriction to the regression coefficients, so some of the coefficientsdegenerate to zero automatically. This method enhances total accuracy while generate an interpretable model. On the base of lasso method, Zou et.al raised elastic net criterion, add l1 and l2 norm restriction simultaneously, solves the defect ofp > n case which lasso can not deal with. Jolliffe et.al extended these ideas to the principal component analysis, developed various methods to get sparse loadings respectively. The main ideas are associate l1 and(or) l2 norm to the loadings. Thispaper explores the sparse problem from another point of view, draw a conclusion from optimal scoring for unsupervised learning provided by Zhang in 2009, and propose SPCA-OS method to get sparse loadings. The experiments show that, under the almost the same cumulative variance, SPCA-OS get sparser loadings compared with other methods. As an dual problem of SPCA, SPCO also raised in this paper. Compared with sparse loadings, this method can get dimension reduction and sparse coordinate simultaneously. The experiments demonstrate that SPCO get sparse coordinate in reduced data matrix, and can deal with p >> n case of data. These two methods are verified in several UCI datasets. The proposition of these two methods provide new ideas to do dimension reduction while still get the interpretable model.
Keywords/Search Tags:Sparse Principal Component Analysis, Sparse Principal Coordinate Analysis, Dimension Reduction
PDF Full Text Request
Related items