Font Size: a A A

Variable Selection In Classification And Application

Posted on:2016-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:M H XuFull Text:PDF
GTID:2310330479454425Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the arrival of the era of information explosion, information growths constantly in geometric distribution. But, in the practical problem, because of the large amount of information may lead to important information buried by a great number of secondary information, result in the wrong understanding on specific issues. so the huge amount of information must be processed to find out the main information and build a concrete model with the main information to analyze the specific problems. This process, which is aimed at the variable selection for specific issues. The research of variables selection for specific issues has far-reaching significance for the classification problem in statistics, so the variable selection will be needed more before classification.This paper adopted the UCI data set of Wisconsin's breast cancer data and Lanzhou2014.1 2015.3 a year of air monitoring data for research, find out the main variables in the two problems.At present,the all advanced variable selection methods are using the penalized likelihood function of variables' coefficients to work out the estimated value of the optimal parameters,namely the shrinkage of the variables' coefficients, to reach the variable selection.And this article starts from the measuring error, thinking the observed values have the measurement error and build the likelihood function on the accuracy of measurement,then using the principle of solving the optimization problem to implement the shrinkage of variables' coefficient by Lasso method, the measuring accuracy of observation value is shrinkaged, which is the measurement error variance corresponding to the measurement accuracy of zero variables is infinite and the error of the corresponding variable wave with large.Then also makes variables lost their value in the model and these variables were selectioned out of model, so as to realize the variable selection.In this paper, in particular in the nonparametric classification to use this new variable selection method, and in the previous example comparing with existing variable selection method, found that two methods of variables to construct a classifier, a new method of classifier effect is better, the classification error is smaller. And the new method in the application of air quality data in Lanzhou, also reflects the superiority of the new method,further clearly gives the Lanzhou last year, the main pollutants in the air.
Keywords/Search Tags:Air quality, Measurement error, Likelihood, Variable selection, Bayes classifier
PDF Full Text Request
Related items