Font Size: a A A

Variable Selection Method For High Dimensional Data

Posted on:2022-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2480306329989769Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the rapid development of science and technology today,our life is closely related to data.It can also be said that data constitutes every part of our production and life.Every observed instance contains a huge amount of information,even a single observation can have thousands of dimensions.In fact,for us,these data are not all valuable,and how to choose the part we want from the massive data,high-dimensional data analysis arises at the historic moment.After the early 1960 s,variable selection was first proposed,it caused a great interest in many mathematicians.1996 Tibshirani Lasso algorithm is proposed,Fan and Li proposed SCAD algorithm in 2001,this method not only retained the advantages of the subset selection and ridge regression,and generates a sparse solution,to ensure the continuity of the selected model,and on the larger coefficient with unbiased estimation.In the same year,Breiman proposed the random forest algorithm,which is a relatively niche algorithm,but has many advantages,such as good stability and prediction accuracy,and the ability to sort the importance of explanatory variables.In 2006,Zou proposed the Adaptive Lasso method.In 2007,Nicolai meinshausen defined a Relaxed Lasso estimation and proposed a two-stage algorithm for calculating the Relaxed Lasso estimation.The Relaxed Lasso estimation is based on the Relaxed Lasso method.The structure of this paper is divided as follows: The first chapter introduces the research background of variable selection problem and the research status at home and abroad,and makes a brief description of the research content of this paper.The second chapter introduces some preliminary knowledge.The third chapter introduces the variable selection methods proposed in recent years,and lists the advantages and disadvantages of these methods.Chapter4 introduces a new algorithm.Chapter 5 uses Lasso algorithm to model GDP data,residents' per capita disposable income data and birth rate data as well as variable selection.The sixth chapter is a summary of this paper.
Keywords/Search Tags:high dimensional data, variable selection method, case study
PDF Full Text Request
Related items