Font Size: a A A

Constrained Variable Selection And Conditional Feature Screening For High Dimensiollal Models

Posted on:2016-06-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Q HuFull Text:PDF
GTID:1220330461485533Subject:Financial mathematics and financial engineering
Abstract/Summary:PDF Full Text Request
High-dimensional data are frequently collected in a large variety of areas such as biomedical imaging, functional magnetic resonance imaging, finance and earth science. In high-dimensional data, the number of variables or parameters p can be larger than the sample size n. Such a "large p, small n" problem has imposed many challenges for statistical analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. In practice, based on previous investigations and experience, researchers often impose constraints on parameters or know that a certain set of predictors is related to the response. This thesis aim at using the conditional information to improve the performance of estimation and variable selection and reduce the influence of the correlation among the predictors on the feature screening.In Chapter 2, we studies variable selection with constrains on parameters for high dimensional models. There are many applications where constraints are imposed on pa-rameters or a certain set of predictors is related to the response to accommodate existing knowledge and assumptions on the problems. For example, Fan et al.(2012) considered portfolio selection where the objective is to determine how to allocate investment among p different asserts to maximize return. The problem was formulated as a linear regres-sion with a lasso penalty and a linear equality constraint ∑jp=lβj= 1.The performance of estimation and variable selection can be further improved by incorporating the pri-or knowledge as constraints on parameters, for instance, constrained least square. In Chapter 2, we studies the linearly constrained generalized lasso(lcg-lasso for short) in high dimensional linear model. The dual of the problem is derived, which is a much simpler problem than the original one. As a by-product, a coordinate descent algorithm is feasible to solve the dual. A formula for the number of degrees of freedom is derived.The method for selecting tuning parameter is also discussed. We consider the following optimization problem, d E R9, E ∈ Rs×p, and f ∈ Rs are specified according to the prior information. The lasso and its variants are special cases of lcg-lasso with proper choices of D, C, d, E and f, such as adaptive lasso(Zou,2006),fused lasso(Tibshirani et al.,2005), generalized Lasso(Tibshirani and Taylor,2011) and the constrained lasso in Fan et al.(2012). X is assumed to have full column rank. After ignoring a constant, due to the Lagrangian dual, the dual problem of lcg-lasso is primal solution β and the dual solution u, ξ,η is Note that this dual problem is a standard quadratic programming problem. Compared with the primal problem, this dual problem is much easier to solve, because it has less number of parameters and all constraints are box constraints. We will derive a coordinate descent algorithm to solve it in Section 2.4. Due to the KKT condition of dual problem, we define two boundary sets for the dual solution, Clearly we can have the boundary sets for the primal solution through the relationship between the primal solutions and dual solutions, A and B are the set of indexes corresponding to nonzero Diβ and active inequality constraints, respectively. For any λ, when U and C are known, we are able to write out an explicit expression for the dual solutions u, ξ, η, and also an explicit expression for the primal solution β and the fitted response μ= Xβ. In section 2.3, we will show that for any given λ> 0 and almost every y, the boundary sets U and C are locally constant in a neighborhood of y, β is continuous and μ is uniformly Lipschitz continuous for almost every y. That means μ(y) is continuous and almost differentiable as a function of y. Following by Stein’s Lemma(Stein,1981), we can get if y follows a normal distribution, for any D, C, E and λ≥ 0, the degrees of freedom of μ = Xβ is defined to be the dimension of its null space null( In many practical applications, X does not have full column rank, that is rank(X)< p. The proposed methods are not applicable in this scenario. One easy way to get around it is to impose an extra l2 penalty to lcg-lasso and consider whereγ> 0 is a small positive number. Let β* be the solution of the above problem, the degrees of freedom of μ= Xβ* is freedom, we can minimize the estimated risk over λ to select an appropriate value for the tuning parameter. By the connection between Mallows’Cp and AIC/BIC, a BIC-type criterion can be defined as where wnis a selected constant. When wn= 2, the criterion is Mallows’Cp or AIC, while when wn= log(n), the criterion becomes BIC. Therefore, we can use BIC-type criterion to select the tuning parametric A in lcg-lasso.In Chapter 3, based on the known active predictors in advance, we consider the feature screening for ultra-high dimensional parametric models. Fan and Lv (2008) emphasized the importance of feature screening in ultrahigh-dimensional data analysis. Since the seminal work of Fan and Lv (2008) on sure independence screening, there has been a recent surge of interest on ultra-high dimensional feature screening. However, the most existing feature selection methods such as SIS and its relevant versions suggested to screen variables by ranking marginal utility such as marginal correlation with the response. However, due to the correlation among the predictors, the sample marginal screening can screen out hidden important variables who has a big impact on response but are weakly correlated with response, and it also can recruit those variables who have strong marginal utility but are conditionally independent with the response given other variables. In many applications, based on some previous investigations and experiences, researchers often know a set of certain predictors are related to the response in advance. In Chapter 3, based on the conditional information, we propose conditional screening feature procedures via ranking conditional empirical likelihood ratios (CMELR-CSIS for short).Let XC denotes the set of active predictors known in advance. Consider the following moment condition, for any vector or matrix βc, where αj is denoted as the correlation coefficient between the centralized variable Xj E(Xj|XCTβc) and the response Y. Based on the above moment condition, we can con-struct conditional marginal empirical likelihood ratio, empirical likelihood ratio should not be large when evaluated at the truth value, and the marginal empirical likelihood ratio statistics has high probability to take large value when evaluated at the false values. That means lj(0) can be used as a feature screening tool. Since lj(0) contains unknown in estimating function the linearity condition, we replace lj(0) by its estimator lj(0). Finally, we select the index set of active variables as where γn is a predefined threshold value, where A denotes the active set. This method will be referred to as conditional sure independence screening based on conditional marginal empirical likelihood ratio or CMELR-CSIS for short.CMELR-CSIS is demonstrated effective in scenarios with less restrictive distribu-tional assumptions by inheriting the advantage of empirical likelihood approach, and is computational simple because it only need to evaluate the conditional marginal empiri-cal likelihood ratio at one point, without parameter estimation and iterative algorithm. The theoretical results reveal that the proposed procedure has sure screening properties and a very good control of the size of the selected set of variables with suitable threshold value. Extensive numerical examples further demonstrate that in the case of high corre-lation among variables, unconditional screening procedures are close to collapse, but our proposed procedures still work well. Moreover, simulation results show the robustness of the CMELR-CSIS to the conditional set. When without the information on conditional set. an effective method is provided for constructing CMELR-CSIS.In Chapter 4, we investigate feature screening for either one or both mean and variance functions with multiple-index framework in high dimensional regression models. The existing methodologies for feature screening focus mainly on the mean function of regression models. The variance function, however, plays an important role in statistical theory and application. By the methods of model-free feature screening proposed by Zhu et al.(2011) and Lin et al.(2013), although we can successfully sort out the active predictors, without use of any model assumption, we can not clearly identify what active predictors are related to the mean function and what active predictors are related to the variance function. In Chapter 4, we consider the feature screening for the following ultra-high dimensional multiple-index heteroscedastic models, ε is independent of X with mean E(ε)= 0 and variance E(ε2)= 1. This model includes many popular semiparametric regression models as its special cases, such as the partially linear model, the single index model, and the partially linear single index model and so on. When we focus on the mean function in the model, under some regular conditions, we can getwhere Aμ denotes the active index set of mean function. Based on the estimation func-tion we can construct conditional marginal empirical likelihood ratio, the estimator of goal set D∩Aμ, where γn is a predefined threshold value,, lj is the estimator o is the lagrange multiplier satisfying We call the method as EL-CFS. For the variance function in the model, since It implies that the correlation between the centralized variable Xj - E(Xj|XCTβC) and Y2 has information about the indices of mean and variance functions. We can also have where A= Aμ∪Av is the active set of the multiple-index heteroscedastic model, and Aμ and Av are the active sets of the mean and variance function, respectively. Similarly, we choose the estimator of the active sets of the mean and variance functions as where γn is a predefined threshold value and Notice that in this case, gij is the estimator of obviously that an be used as estimator of D ∩ Av. This method is the modification of EL-CFS, called mEL-CFS. In Chapter 4, we show that is a consistent estimator of D ∩ Av.Therefore, we can get a consistent estimator of the active set of variance function in heteroscedastic model through EL-CFS and mEL-CFS. It is interesting that the newly proposed screening procedures can avoid estimating the unknown link functions in the mean and variance functions, and moreover, can work well in the case of high correlation among the predictors. The theoretical results reveal that the proposed procedures have sure screening properties when the number of predictors grows exponentially with the sample size. Furthermore, as a conditional methodology, our method is robustness to the choice of the conditional set.
Keywords/Search Tags:Variable selection, Duality, Ultrahigh dimensional data, Empirical likelihood, Feature screening, Heteroscedasticity, Multiple-index
PDF Full Text Request
Related items