Font Size: a A A

Feature Selection Based On Original Data Correlation

Posted on:2018-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:2347330533957207Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In the feature selection problem,Lasso?least angle regression and stepwise regression(such as forward stepwise regression)can describe the process of feature selection,but these methods' s processes of feature selection are flawed.The least angle regression and the modified least angle regression can only describe the cases where the variables are selected and deleted,and the others can not be known.Therefore,the least angle regression is not perfect in data sparse process.It is easy to miss some processes if the forward step is too long,but if we give short step length,we will spend much time in computing.We can get a Complete data sparse process if we take all parameters of Lasso,however,the Lasso parameter is continuous,so it takes a lot of parameters to get the whole process,while the Lasso itself is a difficult problem to solve.This paper proposes a feature selection method based on the original data correlation.The method(the formula method)use the thought of modified least angle regression to select the features,but we will not centralize the respond variable,so we can get the corresponding relation between the value of correlation and the Lasso's parameters,This corresponding relation can help us get the formal solution of Lasso given any parameter after this algorithm.This method not only improves the accuracy of the solution of Lasso,but also reduce the computation time in large number of parameter's grids test of Lasso.Using the formula method in a diabete data study,compared the coordinate method and quadratic approximation algorithm,we found that the formula method has the highest accuracy.We also compare the running time of the three algorithms in different dimensions,different sample sizes and different parameter lattice points.We find that the formula method spend the least time as the number of dimensions,sample size,and parameter grid points increase,and the increase in running time is much slower than the other two methods.The thought of our algorithm can also be used to explain some methods such as coordinate descent algorithm for solving Lasso.
Keywords/Search Tags:Lasso, Least angle regression, Feature selection, Coordinate descent method
PDF Full Text Request
Related items