Font Size: a A A

Multiple Linear Regression Analysis And Multiple Logistic Regression Analysis Methods For Complex Random Sampled Data And Their Application

Posted on:2016-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:R Y SunFull Text:PDF
GTID:2284330461493426Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
[Objective] To overcome the situation that when most researchers process multiple regression analysis data obtained using complex random sampling methods always choose ordinary multiple linear regression analysis and ordinary multiple logistic regression analysis, which are based on the simple random sampling and do not incorporate sampling weight into estimates of regression coefficient parameters. By comparing the differences between different modeling strategies (not considering any weight, only considering the sampling weight, only considering the observation weight, and considering the comprehensive weight) on principles and simulation results to draw the researchers’attention. At the same time, constructing the concepts and the calculation principles of the observation weight and the comprehensive weight. Simulating to investigate the role and the function of the observation weight and the comprehensive weight in multiple linear regression analysis and multiple logistic regression analysis of complex random sampled data.[Content] At first, study the principles of the multiple linear regression analysis and multiple logistic regression analysis of complex random sampled data by collection, reading, organizing and summarizing the literatures in large databases. Then based on the ideology of the weight coefficient in comprehensive evaluation, constructing the concepts of the observation weight, which is the statistic reflect the significance of each individual or observation to population. Constructing and improve the concept and the calculation principles of the observation weight and the comprehensive weight which is derived from the observation weight.Based on the Monte Carlo random simulation conception, we supposed that the complete data was the sampling population, in which we develop the stratified random sampled dataset with different sampling rate. After that, we developed multiple linear and multiple logistic regression analysis with sampling weight, observation weight, comprehensive weight and without weight. Then we compared and discussed the veracity, steadiness and delicacy of the results. Then we regarded the data as the sample and endued every observation with a sampling weight under different sampling rates. By developing modles under four weights to verify the conclusion above.[Methods] l.By a large amount of collection, reading, organizing and summarizing the literatures in large databases. The principles of the multiple linear regression analysis and multiple logistic regression analysis of complex random sampled data have been studied.2.Typing the blurry concepts in weight and sampling weight, developing and constructing the new concepts of the observation weight and the comprehensive weight and their determination methods.Providing the theoretical support and guides for following researches and investigations about weight.3.Simulation study:To achieve the comparing research of multiple regression analysis with sampling weight, observation weight, comprehensive weight and without weight, we use the data from the 2009-2013 survey of the U.S. Nutrition and Health Research Center and the age group (the group interval was 10 years) was the stratification factor. At first we develop the stratified random sampled dataset with 5%-95% sampling rate (step length was 10%) from the data, then perform the multiple linear regression analysis and multiple logistic regression analysis with above-mentioned four different modeling strategies. After that, we compare the results and investigate the models fitting effects in these strategies. Then we regarded the data as the sampled dataset. Developing the multiple linear and multiple logistic regression modles under four strategies to verify the conclusion above.[Results] 1.Comparing the model construction and parameter estimation methods of multiple linear regression analysis and multiple logistic regression analysis of complex sampling survey data. Comparing the mathematical principles and conditions between Least Square Method (LSM) and Design-weighted Least Squares Method (DWLS), Maximum Likelyhood Method (MLM) and Pseudo Maximum Likelihood Method (PMLM). As the results, we realize that when fitting data using the multiple linear regression model it was possible to more accurately perform parameter estimates of regression coefficients and statistical prediction of outcome variables if the sampling weight of the survey data was incorporated into the statistical analysis.2.Typing the blurry concepts in weight and sampling weight, developing and constructing the new concepts of the observation weight and the comprehensive weight and their determination methods.Providing the theoretical support and guides for following researches and investigations about weight.3.Through the simulation study of processing multiple linear regression analysis on complex sampling survey data, we get the following results:Under the modeling strategy without considering weight, the number of independent variables incorporated at different sampling rates varied and less than there should be. At the same time, its standard error is the biggest. The statistics root mean square error which reflects the effect of modle fitting is big, its coefficient of determination is small. So the accuracy, precision and sensitivity of its model fitting is poor. Under the modeling strategy only considering the sampling weight, the number of independent variables incorporated at different sampling rates varied, and it achieving stability when the sampling rate reach 85%. At the same time, the root mean square error and the coefficient of determination is as big as the strategy without considering weight. So comparing with the strategy without considering weight, the accuracy, precision and sensitivity of its model fitting improved, but did not meet the researchers’target. Under the modeling strategy only considering the observation weight, the number of independent variables incorporated achieved stability at the sampling rate 25%. The coefficient estimates kept stable, and the root mean square error significantly reduced, while the coefficient of determination significantly increased and close to 1. So the accuracy, precision and sensitivity of its model fitting significantly increased. But its principle is based on the simple sampling method so it is not recommended. Under the modeling strategy considering the comprehensive weight, the number of independent variables incorporated achieved stability at the sampling rate 35%. The coefficient estimates kept stable. The root mean square error is the smallest, while the coefficient of determination is closed to 1. So the accuracy, precision and sensitivity of its model fitting are best.4.Through the simulation study of processing multiple logistic regression analysis on complex sampling survey data, we get the following results:Under the modeling strategy without considering weight, the number of independent variables incorporated at different sampling rates varied, and it achieving stability when the sampling rate reach 85%. At the same time, the AIC value and SC value which reflect the effect of model fitting is big and the coefficient of determination is small. So the accuracy, precision and sensitivity of its model fitting is poor. Under the modeling strategy only considering the sampling weight, the number of independent variables incorporated at different sampling rates varied, and it achieving stability when the sampling rate reach 65%. At the same time, the AIC value and SC value of model fitting is big and the coefficient of determination is small. So comparing with the strategy without considering weight, the accuracy, precision and sensitivity of its model fitting improved, but did not meet the researchers’target. Under the modeling strategy only considering the observation weight, the number of independent variables incorporated achieved stability at the sampling rate 35%. The coefficient estimates kept stable, and the AIC value and SC value of model fitting significantly reduced, while the coefficient of determination significantly increased and close to 1. So the accuracy, precision and sensitivity of its model fitting significantly increased. But its principle is based on the simple sampling method so it is not recommended. Under the modeling strategy considering the comprehensive weight, the number of independent variables incorporated achieved stability at the sampling rate 25%. The coefficient estimates kept stable. The AIC value and SC value of model fitting is the smallest, while the coefficient of determination is closed to 1. So the accuracy, precision and sensitivity of its model fitting are best.[Conclusion] As the conclusion, we realize that when fitting data using the multiple linear regression model and the multiple logistic regression model it was possible to more accurate, precise and sensitive parameter estimates of regression coefficients and statistical prediction of outcome variables if the comprehensive weight of the survey data was incorporated into the statistical analysis.
Keywords/Search Tags:Complex sampli ng, Sampling weight, Observa tion weight, Com prehensive weight, Multiple l inear regression analys is, M ultiple logistic regression analys is
PDF Full Text Request
Related items