Research On Robust Regularization Methods For Regression

Posted on:2022-01-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M H Su

Full Text:PDF

GTID:1480306509466434

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rapid development of science and technology brings convenience to data collection,and also brings great challenges to data analysis.As a mathematical tool to study the correlation between variables,regression analysis has become one of the main tools for data analysis and research.The main task of regression analysis is to estimate the regression coefficients/parameters,that is,to determine the correlation between samples according to the known observations,so as to predict or interpret the unknown data.As a parameter estimation model,regularization method has been widely used in many fields.The most famous is the penalized least squares estimator,but it is not robust because of the sensitivity of square function.Therefore,when the model error follows the heavy tailed distribution or there are outliers in the data.How to insure the robustness of estimators so as to improve the accuracy and interpretability of the regression model is worth to be study.This dissertation mainly focuses on two aspects.On the one hand,it studies the robustness of parameter estimation in linear model.In particular,this paper will focus on the selection of weights in the weighted regularization method and the correlation of variables in quantile regression.By solving the problem of weight selection and variable correlation,the robustness of the model is improved,so as to improve the prediction accuracy of the regression model.On the other hand,we will focus on the generalized regression model,Specifically,we will consider how to improve the robustness of the model from the perspective of regression model construction,mainly including how to integrate the sample neighborhood information into the model.And how to solve the problem of heavy tailed distribution to ensure the robustness of the model when the interaction terms are added into the regression model.The main works are summarized as follows:(1)A self-weighted robust regularization method is proposed.The weighted regularization methods have the ability to improve the robustness of the model.However,improper selection of weights may lead to large deviation of parameter estimation,and then reduce the accuracy of model prediction.By adding an adaptive regularization term to the weighted model,the weights can be determined according to the sample loss.On the one hand,it solves the problem of weights;on the other hand,it improves the accuracy of model prediction.And the consistency of the estimator is proved theoretically.Based on alternating iteration algorithm,the coordinate descent algorithm is given.Experiments on a large number of artificial data sets and UCI data sets show the effectiveness of the proposed method.Especially when there are outliers in the data,this method is superior to other methods.(2)The Elastic Net penalized quantile regression model is proposed for the correlation of independent variables.In the framework of regularization,the L₁ penalty can produce sparse solutions,the L₂ penalty can select variables with strong correlation simultaneously,and the quantile estimation is robust to outliers.In this paper,an Elastic Net penalized quantile loss is proposed by using their respective advantages.The theoretical explanation of the model is given and the consistency of the estimator is established.Furthermore,the ADMM algorithm for solving the problem is given.Experimental results show that the proposed method performs well.Not only is robust to outliers or heavy-tailed error,but also can solve the problem of correlation between variables.(3)A regression model with sample neighborhood information is established.Through the network graph structure,the information of adjacent samples is effectively added into the regression model.Thus,it not only contains the attribute characteristics of each sample,but also obtains the adjacent sample information in regression model.Then,the L₁ penalized least squares is proposed to estimate the regression parameter.Theoretically,the error bound of the estimator is established based on the restricted eigenvalue condition.Computationally,the coordinate descent algorithm can be used to solve the problem.A large number of experiments on artificial data sets show that the proposed model has higher prediction accuracy than ordinary linear regression model.It is applied to the housing price forecast data,and the results further verify the effectiveness of the proposed model.(4)A robust regularization estimation for regression with interaction terms is proposed.Although the linear regression model is simple and easy to explain,it is not enough to contain all the information provided by the sample.By introducing an interaction term,the additive assumption of linear model can be removed from the perspective of model condition assumption;and the hidden information contained in data can be further employed from the perspective of model interpretation.In order to solve the problem of heavy tail error distribution,two robust parameter estimation methods based on L₁ penalty and adaptive L₁ penalty term are proposed.In theory,the consistency of the latter is proved.Further,the ADMM algorithm for solving the proposed models is proposed.Experiments on artificial data sets show that the proposed estimation method can improve the accuracy of the prediction and has obvious advantages in the variable selection when the model error follows the heavy-tail distribution.Experiments on real data sets show that the regression model with interaction terms is superior to the linear model.Moreover,the proposed estimators are superior to the penalized least squares estimators in terms of prediction accuracy and variable selection.In the regularization framework,this paper mainly focuses on the robust estimation of parameters and the construction of the model in regression models.On the one hand,the results have improved the theoretical research of robust regularization estimation methods;On the other hand,it provides an effective robust estimation method for the analysis of complex data and has very important practical values.

Keywords/Search Tags:

Regression, Regularization, Robust, Variable Selection, Optimization Algorithm, Heavy-tailed error

PDF Full Text Request

Related items

1	Heavy-tailed Phenomena, Heavy-tailed Distribution And Heavy-tailed Index Estimators
2	Research On Robust Variable Selection Problem Based On AME Estimation
3	Robust Variable Selection With Outliers Based On Combined Quantile Regression
4	Robust Bayesian Inference In Multivariate Heteroscedastic Replicated Measurement Error Models
5	Robust And Profile Inferences For Some Nonparametric And Semiparametric Regression Models
6	Model Selection For High-Dimen Sional Quadratic Regression Via Regularization
7	Modal Regression And Variable Selection For Two Semiparametric Models
8	Two Classes Of Semi-paranmetrie Estimators And A Location Invariant Estimator Of The Heavy-tailed Index
9	Balanced Penalized Quantile Regression With High-dimensional Heavy-tailed Data
10	Stud Ies On Robust Va Riable Selection Methods