Font Size: a A A

Estimation Of Effects For Strongly Correlated Variables In Linear Models

Posted on:2020-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:L HuaFull Text:PDF
GTID:2370330590995169Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Correct and accurate parameter estimations in a linear model are important for adequate statistical inference and prediction.In linear models,the usual estimation procedure for the unknown parameters is based on the Gauss-Markov theorem,which makes sure the least-squares estimators are unbiased with the minimum variance in the class of linear unbiased estimators.One of the main assumptions for general linear models is that predictor variables are linearly independent.However,predictor variables are often nearly linearly dependent in linear regressions,which is called multicollinearity problem for predictor variables.The sources of such multicollinearity are well documented in many books for linear regression analysis,which can be summarized as four types: the data collection methods applied,constraints on the model or in the population,model specification and an over-defined model.Understanding the sources of multicollinearity is helpful for analysis of the data and interpretation of the resulting model.In general,strongly correlated predictor variables are common in different fields like wireless communication systems and longitudinal data analysis.For example,pairs of signals with certain antenna spacing are correlated in an antenna array.A longitudinal data analysis generally involves several measurements on one subject over time.In this case,measurements within-subject are correlated variables.When used as predictor variables,such variables with strong or extreme correlations generate multicollinearity in linear regressions.This multicollinearity problem results in unusually large variances,even wrong signs or large absolute values,of these unbiased estimators for strongly correlated estimators,and then misleading statistical prediction and inference.In terms of the ways to diagnose the existence of multicollinearity,examination of the correlation matrix of predictor variables is a very simple one.By inspection of the off-diagonal elements of the correlation matrix,we can easily find the pairs of strongly dependent variables.Unfortunately,this method is only helpful to detect high correlations between pairs of variables.If there are multiple variables that are highly correlated,the variance inflation factor(VIF)of a predictor can be used to identify and eliminate potentially redundant variables.One or more large VIFs indicate multicollinearity.Besides,eigensystem analysis of correlation matrix is a useful alternative for multicollinearity diagnostics and singular-value decomposition is a similar approach which uses the variance decomposition proportion to give more specific information on the eigenvectors' contribution to multicollinearity.Checking the condition number and condition indices of correlation matrix is also a good way to measure the presence of multicollinearity.Other diagnostics are occasionally useful,such as the determinant of parameter matrix and the signs or magnitudes of the parameters.Because of inaccuracy of the ordinary least-squares parameter estimates for strongly correlated variables,researchers came up with several typical methods as complements,such as ridge regression,partial ridge regression,Bayesian estimation and principal component regression analysis.However,there are some drawbacks for these methods.For example,ridge regression shrinks all parameter estimations together,no matter whether some of them are correlated.It achieves stability at the cost of increase in bias and selects the penalty parameter subjectively.Also,all these existing methods are more complicated than the ordinary least-squares regression.Even though estimations of individual parameters for strongly correlated predictor variables are inaccurate with large variances,even wrong signs or large absolute values because of multicollinearity,surprisingly,some linear combinations of strongly correlated variables,called group effects,can be estimated accurately.With this knowledge in mind,we focus on accurate estimations of such group effects,instead of that of individual parameters/effects.Comparing those methods dealing with the multicollinearity caused by strongly correlated predictor variables like ridge regression,Bayesian estimation and principal component analysis,we utilize the impacts of multicollinearity on parameter estimations to estimate group effects of these variables accurately.Both theoretically and numerically,we aim to find the optimal and estimable effects in linear models with strongly correlated variables,as well as relationships between them.With the understanding of a uniform correlation model whose predictor variables are strongly correlated with the same correlation coefficient,we develop a linear model with exponentially correlated predictor variables,called exponential correlation model.In wireless communication systems with multichannel reception,the correlation of neighbouring sub-channels is higher than that of distant sub-channels.This can be characterized by the exponential correlation model we proposed.Taking this advantage,the exponential correlation model is frequently used for communication problems and the performance analysis of various wireless systems.With the advantage of characterizing decaying correlations among variables,the exponential model is also an approximation to a general linear model with strongly correlated variables in that absolute values of correlations among these variables are close to 1.For a standardized uniform correlation model,the average group effect of strongly correlated variables is the optimal in the class of normalized group effects.Also,other estimable effects are all around the average group effect.Motivated by interesting conclusions derived from the uniform model,we aim to discover the impacts of multicollinearity,caused by exponentially correlated predictor variables,on parameter estimations and look for the optimal and estimable group effect estimations of these variables,as well as relationships between these two effect estimations.Theoretically,we derive variance of the least-squares estimators for individual parameters in the exponential correlation model and give detailed proof.Also,we prove the individual least-squares estimators for exponentially correlated predictor variables are with unusually large variances,especially for extreme correlations,whereas corresponding estimable group effect estimations are with small variance.More importantly,we find the optimal group effect in an exponential correlation model and prove its optimality.For an exponential correlation model,both theoretically and numerically,we conclude that stronger correlations among predictor variables lead to larger variances of their unbiased least-squares individual estimators,but smaller variances of corresponding unbiased optimal and estimable group effect estimations.Our numerical examples also show that all weight vectors for estimable group effects,such as the average group effects,should be in a neighbourhood of the optimal weight,which means other estimable group effects are all in a small neighbourhood of the optimal one.This neighbourhood narrows as the correlation gets stronger.Surprisingly,there are some interesting properties of these optimal weights: they are symmetric and close to the average weights,especially for large correlation among predictor variables.Numerically,all estimable group effect estimations are asymptotically optimal with nearly same values and small variances for extreme correlation among predictor variables.In particular,the average group effect is always estimable and asymptotically optimal.For complementation and comparison,we also visualize the relationship between the optimal and estimable group effects in a uniform model.Even though researchers have had discussions,there is no valid way to present and interpret it.By visualizing the neighbourhood of the optimal effect in both a uniform and an exponential model,it is more easily to uncover the nature of relationships between neighbourhoods and correlations.Not surprisingly,estimable effects should be in a small neighbourhood of the optimal one and the neighbourhood is smaller and smaller as correlation among variables increases.The optimal and estimable group effects are of great value.Firstly,they are meaningful for parameter estimation and inference.For example,if the optimal group effect is significant,we could reject the null hypothesis that all parameters in this group are zero and make the conclusion that at least one of them is not zero.Secondly,the optimal group effect is accurate for making reliable predictions based on the established models.Thirdly,we can find other estimable group effects based on their relationships with the optimal effect.Lastly,an estimable effect can be used for dimension reduction.To be specific,if a group effect consisting of p strongly correlated variables is estimable,then it reduces the parameter space to a line in this space.Our numerical results also illustrate the local nature of multicollinearity in that it has little impacts on the variances of unbiased least-squares estimators for uncorrelated individual parameters.This can be helpful in estimating individual parameters accurately for strongly correlated variables.The main idea is to impose the constraint of their accurate linear combinations on these variables.Based on the accuracy of the optimal group effects of highly correlated variables,we can look for a parameter estimators that are close to the true parameter and calculate its distance to the origin.For estimator that is mostly close to the origin,we regard it as a lower bound for all feasible estimators,and then find all available estimators.Based on the region of available correlated individual estimators and the constraint of accurate optimal effect estimation,we may estimate individual effects accurately in a linear model with strongly correlated variables.Simulation studies also lead to the observation that methods and conclusions from the exponential correlation model are applicable in a general linear model in that the former model is an approximation to the latter one as the correlations among predictor variables tend to one.Estimating group effects of strongly correlated variables is an innovation,making full use of multicollinearity other than avoiding or remedying it.Without loss of accuracy of prediction and inference,this method is more simple to interpret,implement and make inference.Consequently,it may serve as a complementary method for dealing with multicollinearity in linear models.
Keywords/Search Tags:least-squares linear regression, multicollinearity, exponential correlation model, the optimal group effect, estimable group effect
PDF Full Text Request
Related items