Estimation Of Elastic Net Penalized Composite Quantile Regression For High-dimensional Data With Applications

Posted on:2024-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:G H Zhang

Full Text:PDF

GTID:2557306917990319

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Variable selection has long been a hot issue in statistical modeling and analysis.In recent years,the rapid development of information technology has made it easy to collect high-dimensional data,while the dimensionality of high-dimensional data variables is much larger than the number of samples,causing a great impact on the traditional variable selection methods.Concurrently,the problems of strong correlation,multicollinearity,and outliers all over the high-dimensional data generally make the model easily fall into the dilemma of overfitting and instability,which directly affects the prediction accuracy of the model.Therefore,this thesis aims to propose a robust estimation method that can cope with various problems and achieve variable selection under high-dimensional data.The main research work conducted and the results obtained are as follows.Firstly,Chapter 3 develops a composite quantile regression model with the elastic network method penalty and theoretically demonstrates that the estimates of the model satisfy compatibility,sparsity,and asymptotic normality of the parameter estimates under certain conditions.The model combines the properties of ridge regression and the LASSO penalty function,which not only enables simultaneous variable selection and coefficient compression,but also effectively addresses the problems of multicollinearity and strong correlation.Its special structural properties allow it to have a grouping effect,i.e.highly correlated variables are selected into or out of the model at the same time.On the other hand,the choice of a composite quantile loss function to replace the least squares loss function allows the model to achieve robust estimates when dealing with outliers,or data showing spikes or thick tails in the distribution.Secondly,Chapter 4 establishes the algorithmic steps for solving the above model by the directional alternating multiplier method and designs numerical simulation experiments to highlight the superiority of the proposed method by comparing it with other commonly used models.As the composite quantile function has convexity but not differentiability,it makes the conventional solution algorithm easily fall into the disadvantage of the local optimum solution.At the same time,the large data size of high-dimensional data also puts the efficiency of solving the model to the test,while the directional alternating multiplier method,as the best of the distributed algorithms,can achieve efficient and accurate solutions through its unique decomposition and coordination process.Finally,Chapter 5 presents a study of leukemia classification based on the above models and algorithms,which highlights the ability of the models to achieve more accurate classification accuracy by comparing them with mainstream machine learning algorithms,demonstrating that the models are excellent at solving practical problems and highlighting their excellent application value in the era of big data.

Keywords/Search Tags:

High-dimensional data, Variable selection, Elastic net, Composite quantile regression

PDF Full Text Request

Related items

1	Variable Selection Of High Dimensional Models With Longitudinal Data
2	Some Estimates And Tests Of Panel Data
3	Elastic Correlation Adjusted Regression(ECAR) Score For High Dimensional Variable Importance Measuring
4	Bayesian Statistical Inference For Quantile Regression Models
5	Spatial Bayesian Variable Selection For High-dimensional Scalar-on-Image Regression Under Current Status Data
6	Research On Statistical Inference And Related Issues Of High-dimensional Semiparametric Regression Models
7	Variable Screening Of Regression Models With Missing Data At Random
8	Variable Selection And Variable Screening In High Dimensional Data With Multivariate Responses
9	Variable Selection Method Based On High-Dimensional Multiple Correlation Coefficients
10	Instrumental Variables Estimation And Application Of Quantile Regression Modle With Panel Data