Variable Selection Algorithm Based On Variable Selection Deviation

Posted on:2017-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:S B Wang

Full Text:PDF

GTID:2308330485986007

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of the big data, data redundancy is increasing more and more and becomes expanding in more dimensions. Hence, extracting valuable information from data with a huge redundancy of information is extremely difficult. Therefore, variable selection is necessarily before modeling data. When the model is supposed to be a liner model, there are many variable selection algorithms, such as Lasso, MCP, SCAD etc. The model selected by Lasso generally contains many redundant variables and the model selected by MCP may be lack of some important variables. The distance between the model selected by SCAD and potential true or true model is too large. So the three variable selection algorithms are barely satisfying in some field.In this thesis, The concept that variable selection deviation which can delete redundant variables and preserve importance as we introduced. It can measure the distance between a model and the potential true model. In this thesis, Variable Selection algorithm based on Variable Selection Deviation(VS-Based-On-VSD) are introduced, and Variable Ranking algorithm based on Variable Selection Deviation(VR-Based-OnVSD) is also introduced. The VSD of the best variable subset selected by VS-Based-OnVSD is on the minimum value, and the symmetrical difference between the best variable subset selected by VS-Based-On-VSD and potential true model is smallest. The least redundant variable or information, and useful structure information hidden in the data as far as possible is included in the best variable subset selected by VS-Based-On-VSD. We also provide a method that solves the best variable subset which has the smallest VSD value, and demonstrates that it is global optimal by mathematical proof. Variables included in the model selected by VR-Based-On-VSD are weighted by Variable Selection Deviation and the weight of the variable in the best variable subset is larger than a threshold value. The variable subset is related to the threshold value, and when the threshold value is equal to 0.5, the best variable subset selected by VS-Based-On-VSD is the same of selected by VR-Based-On-VSD. Therefore, if the threshold value is less than 0.5, the variable subset selected by VR-Based-On-VSD will include more useful information which can contribute to prediction and classification for the unknown samples.The contrastive analysis is carried out between the two novel algorithms and three traditional variable selection algorithms(Lasso, MCP, SCAD). When the noise level is not high, the prediction ability of VS-Based-On-VSD is equal to Lasso, which is higher than MCP and SCAD, but the redundant variable included in the selected variable subset is less than Lasso. So the distance between the best variable subset selected by VS-BasedOn-VSD and the potential true model is nearer than VR-Based-On-VSD can effectively describe data set.

Keywords/Search Tags:

Variable Selection Deviation, Variable Selection, Symmetric Difference, Variable Ranking

PDF Full Text Request

Related items

1	Research On Variable Selection In Data Mining
2	The Study On Variable Selection Methods Of Soft-Sensor Technique
3	Variable Selection For Gaussian Mixture Model-Based Clustering And Its Application
4	Research Of Variable Selection Method On Near-infrared Spectrum Modeling
5	Clustering Method Based On Variable Selection And Its Application
6	Research On Group Variable Selection Algorithm Based On Variable Clustering
7	Research On Fault Diagnosis Method Of Industrial Process Based On Variable Selection
8	Research On Variable Selection And Prediction Modeling Method For Industrial Complex Data
9	Variable selection in multi-class support vector machine and applications in genomic data analysis
10	Research On Variable Selection Algorithm Based On Deep Variational Information Bottleneck And Its Application