Font Size: a A A

The Methods To Evaluate The Relative Importance Of Independent Variable In Logistic Regression Models

Posted on:2013-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2234330362975506Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective Introduce into the evaluation methods of relative importance of independentvariable used in logistic regression models, and use some empirical examples to achieve in SASapplying the two most successful methods (dominance analysis and relative weight), estimatereasonably and help to explain the relative importance of independent variables. Simultaneously,introduce and present a visualization tool (rank-odds-ratio plot) to give a quick intuitive overviewto the relative importance of independent variables.Methods1. With the defect of some traditional methods, respectively apply dominanceanalysis and relative weight analysis to specific data to determine the relative importance ofindependent variable, then compare the differences of the estimations with the two methods usingthe same data, and compare with the traditional methods (bivariate correlation, standardizedregression coefficients, etc.).2. Introduce four kinds ofR2analogues that are adequate for logisticregression model, compare and analysis each estimated result in the same model in order to obtaina more suitable generalized coefficient of determination for logistic model.3. Throughtransforming to the ranks with rescale the odds radio, show the relationship of the relativeimportance of independent variables in the same graph.Results1. Traditional methods (such as simple bivariate correlation and standardizedregression coefficients) can not appropriately partition the contribution which is shared, and theresults are instability, the sum of the contribution weights is more than the squared multiplecorrelation of the model, in result that they are unable to accurately estimate the relative importancewhen predictors are in collinearity.2. In conditions of collinearity between the predictors,dominance analysis and relative weight analysis will provide a feasible and reasonable measure fordecompose the interpretable proportion of variance actually, the sum of the proportion ofimportance weight are equal to100%, and always provide a positive estimate. Data3.1shows thatthe dominance weights of each variable are0.0975(19.88%),0.1010(20.60%),0.1835(37.32%),0.1085(22.12%) respectively, the sort of its relative importance is different from the standardizedregression coefficients, and the result shows that the generalized coefficient of determinationR_E~2and R_M~2are more suitable index to assess the variance in logistic regression; it also found thatthere are complete dominance and general dominance. Data3.2shows that each sort of relativeimportance of various statistical indicators is different, and one predictor acts opposite direction on correlation and regression analysis; the relative weights of each variable are0.012(1.8%),0.022(3.3%),0.077(11.5%),0.085(12.6%),0.036(5.3%),0.117(17.5%),0.088(13.1%),0.229(34.1%),0.005(0.8%) respectively. Data3.3shows that the two methods will produce asimilar results of relative importance in the same data, the average absolute value of the differencebetween two estimates is0.0025, the dominance weights of each variable are0.0004,0.0024,0.0007,0.0001, and the relative weights are0.0003,0.0032,0.0007,0.0002respectively. The sortof relative importance is identical to the squared standardized regression coefficients, but the sumof the proportion of squared standardized regression coefficients is more than100%, for examplethe squared fully standardized regression coefficients is equal to112.8%. Furthermore, there isobvious difference between the results obtained from advanced methods and traditional methods.3.Rank-odds-ratio plot may compare more than a predictor in the same graph, simultaneouslysupplies more detailed information, although the method consists of some restrictions as thestandardized regression coefficients contain. Whether to participate in new rural cooperativemedical previously has the greatest impact on the willingness to participate cooperative medical. Inthese populations, educational level is more important compared to the number of householdmembers, and the crowd of senior secondary and higher education are more willing to participation,so is the household with less than three members.Conclusions1. When there is collinearity between the independent variables, dominanceanalysis and relative weight analysis are more accurate tools for quantifying the relative importanceof independent variables, independent of the specific structure of the model, the estimated resultsare in little difference based on the basis of different mathematical theory, and the sum of theimportance weights is equal to the squared multiple correlation of the model. They will provide amore accurate conclusion. Dominance analysis will supply more extraneous information, and thecalculation becomes complex and heavy with increasing the number of variables. While relativeweight analysis is possessed of higher computational efficiency, both approaches may choosemutually.2. Rank-odds-ratio plot is clearer than the chart in the vision, it contains some restrictionsbut they do answer some relevant questions that are not fully addressed by the existing diagnostictools, and it has the practical application.
Keywords/Search Tags:logistic regression, relative importance, collinearity, dominance analysis, relative weight, rank-odds-ratio plot
PDF Full Text Request
Related items