Font Size: a A A

Large Margin Of Group Sparse Subspace For Feature Selection

Posted on:2014-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:1268330392471505Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, bio-technology andInternet technology, high-dimensional data is becoming increasingly popular.Featureselection is an effective method of high-dimensional data; it attached more and moreresearchers. The feature selection is to find a subset that can be a good representation ofthe original data in a given feature set. The following learning algorithm is directlyaffected by feature selection of high-dimensional data.In this dissertation, we investigate feature selection methods, and apply tohigh-dimemsional data. The main content and contributions include:①We use local linear method to represent the non-linear high-dimensional datastructure. The key idea is to decompose an arbitrarily complex nonlinear problem into aset of locally linear ones. We take the nearest neighbor of the sample with same labeland the nearest neighbor of the sample with different lable to approximate the nonlinearhigh-dimension data structure.The method is very simple, efficient and intuitive.②We establish a large margin model of group sparse subspace for feature selection(GSLM). Local nearest neighbor of the sample is projected into the subsapce and definethe margin in the subspace. The nearest neighbor’s distance of the samples with differentlabel and the nearest neighbor’s distance of the samples with same label are project intothe subspace, and then to subtract the two difference value as the margin of the samplein the subspace, the sum of these margins is termed as the margin of all samples.Maximizing the margin can be made the nearest neighbor with the different label shouldbe as large as possible, and the nearest neighbor with the same label should be as smallas possible, and preserve the information of the nearest neighbor to select usefulfeature.In order to more reasonable interpretation of the selected feature by the subspace,and make model to have the strong robust capability, we incorporate theL2,1norm intothe objective function. We propose an efficient algorithm to solve the objectivefunction.the algorithm can obtain the local optimal solution. Comprehensiveexperiments verify that the model can select good fetures and corresponding solvingalgorithm is very efficient.③We establish a large margin model of Trace Ratio-group sparse subspace forfeature selection (TR-GSLM). This method uses the nearest neighbor with the samelabel samples over of the nearest neighbor with the different label samples, if the margin is larger, the nearest neighbor with the same label should be as small as possible, andthe nearest neighbor with the different label should be as small as possible. We use thetrace ratio to obtain the objective function and then incorporateL2,1norm into the theobjective function. For the non-convex objective function, we propose the novelalgorithm which can obtain the global optimal solution. The convergence of thealgorithm is proven. Comprehensive experiments verify the model is better than GSLMfor feature selection, but it is lower efficiency of the algorithm.④We establish an enhanced large margin model of Trace Ratio-group sparsesubspace for feature selection (ETR-GSLM). We take substitution variables to improvethe efficiency of the TR-GSLM algorithm. Although the objective function is extremelycomplex and non-convex, the algorithm can obtain global optimal solution. In order toguarantee the positive definiteness of the sample marginal matrix, we present the newcorrecting method for indefinition matrix. If the marginal matrix is positive definitematrix, the algorithm is the convergence.Comprehensive experiments are conducted to compare the performance of theproposed algorithms with the other five state-of-art algorithms RFS, SPFS, mRMR, TR.Our algorithms achieve better performance than the five alogorithm. The proposedalgorithms are not sensitive for kernel function parameters and the regularizationparameters. Compared with the algorithm LLFS and TR, the proposed GSLM algorithmhas acompetitive performance with a significantly faster computational time.
Keywords/Search Tags:Large Margin, Feature Selection, Group Sparsity, Subspace Learning, High-Dimensional Data Pocessing
PDF Full Text Request
Related items