Font Size: a A A

The Study Of Mean And Median Regression With Ordered Multiple Categorical Covariates Based On Group-Lasso

Posted on:2023-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y JiFull Text:PDF
GTID:2530307073986919Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,statistical regression models have a large number of different types of independent variables,including numerical type,category type,and sequential category type.When considering the sequential category independent variable data set,it is assumed that the sequential category independent variable has pseudo classification.If the existence of pseudo classification is not considered,the direct regression modeling will lead to the over-fitting phenomenon,which will damage the application value of the regression model.On the other hand,when the dimensionality of the sequential category independent variables is large,one of the most important tasks of statistics is to find independent variables that can explain the dependent variable.For the test of pseudo classification in the sequential category independent variable,the existence of pseudo classification in the independent variable is detected by F test first.If pseudo classification exists,it is necessary to identify and fuse the pseudo classification.A linear transformation based on dummy variable and adaptive Lasso penalty recognition method are proposed.This method can identify the pseudo-classification of the sequential category variables and fuse them.Because adaptive Lasso adds penalty terms on the basis of least squares,when the error terms do not meet Gauss-Markov conditions or the model has outliers,the least squares estimation will have a large deviation.However,quantile regression does not make strong assumptions about the distribution of error terms and is not sensitive to outliers,so it can get robust results,which makes more attention on the study of quantile penalty regression,so as to improve the robustness of regression model.Therefore,the pseudo classification recognition method was improved,and the median regression method of adaptive Lasso penalty was used to identify the dummy variables after linear transformation,so as to improve the robustness of the model.For variable selection,there are many variable selection methods for different regression models.For data sets with group characteristics such as sequential category independent variables,the effect of Group-Lasso on variable selection is better.However,Group-Lasso exerts the same degree of compression on each group of variables,which is easy to cause excessive compression.Therefore,adaptive Group-Lasso is used to obtain unbiased estimation.Similarly,adaptive Group-Lasso variable selection through median regression is obtained by replacing the least square estimation with median regression to improve the robustness.In this paper,when dealing with the high-dimensional sequential category independent variable data set,the combination of variable selection and pseudo classification recognition fusion technology can not only get robust parameter estimation results by variable selection,but also avoid the phenomenon of model over-fitting caused by the existence of pseudo classification.The results of statistical simulation show that this method is effective in exploring pseudo classification and variable selection.In the empirical research part,two empirical analyses are carried out.The results show that the model is more simplified and the fitting effect is better after pseudo-classification.
Keywords/Search Tags:median regression, ordinal multinomial, adaptive Group-Lasso, dummy variable
PDF Full Text Request
Related items