Classical regression analysis generally assumes that,for given explanatory variables,the conditional expectation for model error is 0,that is,the explanatory variables are exogenous covariates.However,with the advent of the era of big data,ultra-high-dimensional covariates are often encountered in the process of statistical modeling of big data.In this case,it is often impractical to assume that all covariates are exogenous covariates,but there will be cases where some covariates are endogenous covariates,i.e.,the conditional expectation of model error under the conditions of a given explanatory variable is no longer 0.When the data contain endogenous covariates,the estimates given by the classical least squares estimation method will no longer be conjunctive,but will produce some endogenous bias.At present,in the process of statistical modeling,the problem of data endogenous is mainly the tool variable method,that is,a set of variables that are highly correlated with endogenous covariates is selected to adjust the endogenous covariates.Therefore,how to select valid tool variables is one of the key steps in the statistical modeling process of endogenous data,and the selection of tool variables will directly affect the consistency and validity of subsequent estimates.Based on this,this paper aims to discuss the selection of effective instrumental variables in the statistical modeling of endogenous data,and proposes two effective instrumental variable identification methods.Specifically,this paper proposes an effective instrumental variable identification method based on the auxiliary regression model by constructing an auxiliary regression model and combining the minimum absolute deviation estimation technique with punishment by combining the structure of endogenous covariates and instrumental variables.Another proposed effective instrumental variable recognition method does not require the construction of an auxiliary regression model,but directly gives an effective instrumental variable recognition method based on grouping punishment based on the correlation structure of endogenous covariates and instrumental variables.The two recognition methods have their own advantages in terms of algorithms and calculations,and the data simulation shows that these two effective tool variable recognition methods are effective.As an application,based on the statistical modeling method proposed in this paper,the influencing factors of China’s ecological environment are analyzed and studied.In the modeling process,the urbanization rate is taken as an endogenous covariate variable,and the instrumental variables are selected from three aspects: economy and people’s livelihood,education and employment.The proposed effective instrument variable identification method is used to obtain the average transaction price of housing as an effective instrument variable,and then the endogenous covariate is adjusted based on the identified effective instrument variable,finally the parameters in the linear regression model are estimated.The analysis results show that if the endogenous nature of the urbanization rate is ignored,the impact of the urbanization rate on the ecological and environmental conditions will be underestimated. |