| Variable selection has long been a hot issue in statistical modeling and analysis.In recent years,the rapid development of information technology has made it easy to collect high-dimensional data,while the dimensionality of high-dimensional data variables is much larger than the number of samples,causing a great impact on the traditional variable selection methods.Concurrently,the problems of strong correlation,multicollinearity,and outliers all over the high-dimensional data generally make the model easily fall into the dilemma of overfitting and instability,which directly affects the prediction accuracy of the model.Therefore,this thesis aims to propose a robust estimation method that can cope with various problems and achieve variable selection under high-dimensional data.The main research work conducted and the results obtained are as follows.Firstly,Chapter 3 develops a composite quantile regression model with the elastic network method penalty and theoretically demonstrates that the estimates of the model satisfy compatibility,sparsity,and asymptotic normality of the parameter estimates under certain conditions.The model combines the properties of ridge regression and the LASSO penalty function,which not only enables simultaneous variable selection and coefficient compression,but also effectively addresses the problems of multicollinearity and strong correlation.Its special structural properties allow it to have a grouping effect,i.e.highly correlated variables are selected into or out of the model at the same time.On the other hand,the choice of a composite quantile loss function to replace the least squares loss function allows the model to achieve robust estimates when dealing with outliers,or data showing spikes or thick tails in the distribution.Secondly,Chapter 4 establishes the algorithmic steps for solving the above model by the directional alternating multiplier method and designs numerical simulation experiments to highlight the superiority of the proposed method by comparing it with other commonly used models.As the composite quantile function has convexity but not differentiability,it makes the conventional solution algorithm easily fall into the disadvantage of the local optimum solution.At the same time,the large data size of high-dimensional data also puts the efficiency of solving the model to the test,while the directional alternating multiplier method,as the best of the distributed algorithms,can achieve efficient and accurate solutions through its unique decomposition and coordination process.Finally,Chapter 5 presents a study of leukemia classification based on the above models and algorithms,which highlights the ability of the models to achieve more accurate classification accuracy by comparing them with mainstream machine learning algorithms,demonstrating that the models are excellent at solving practical problems and highlighting their excellent application value in the era of big data. |