Font Size: a A A

The Application Of Lasso- Logistic And Group Lasso-logistic Model For Birth Defects

Posted on:2017-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiFull Text:PDF
GTID:2334330503463305Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:Risk factors of birth defects have the characteristics of complex, not clear,correlation, and the complex relationship between kinds of factors. However, the traditional statistical method will cause biased estimation because too much variables,and can only part of the target for variable selection. This paper will be based on Lasso,Group Lasso of logistic regression variable selection method, to analyze the key factors for birth defects, and set up a sick probability prediction model in birth defects, further to provide better guidance for the prevention of birth defects.Methods:First of all, the paper will introduce the basic principle of Lasso and Group Lasso.And, Group Lasso is an extension to Lasso, the choice is based on the whole unit rather than one of the categories when the multiple classification factors of variables to be chosen. So Group Lasso can better explain and analyze factors we researched. The paper will analysis the data from children and their families. And they are collected from 6counties in Shanxi Province. Risk factors of birth defects of these mothers and their families were investigated from 2006 to 2008. Surveyors picked up 35058 effective questionnaires, including 493 cases with birth defects. Then we sorted out 38 indicators,and set up virtual variables for indicators with multiple classification, including 37 groups and there exist 50 variables. Birth defects occurrence or not as the dependent variable, the others as the independent variables. Birth defects data is analyzed, and we will assess these models' prediction performance.Results:Lasso and Group Lasso has a good effect on variable selection. The age of mother,residence, family income, incest, relatives have birth defects, mother early pregnancy anemia, history of spontaneous abortion, a cold, fever, relative defects, the early stages of pregnancy taking cold medicine, antibiotics, often contact with pets, live with pollution sources, family factors such as smoking, drinking frequently has important influence on birth defects; Early pregnancy often eat meat, vegetables, foliamin can effectively reduce the occurrence of birth defects. For the birth defects data with unbalanced distribution,we established prediction model based on Logistic Regression. And using TPR, TNR, G-mean and AUC as model prediction performance evaluation standard. Then to evaluate performance of the model. The results showed the predicted effect of Lasso and Group Lasso is higher in test set, and have a good extrapolating ability.Conclusion:Apply Lasso, Group Lasso based on Logistic regression model can select variables which can explain birth defects. And it can predict effectively. Lasso can choose more concise model than Group Lasso. Whereas Group Lasso select the whole unit variables can better explain and analyze the factors studied, has stronger significance in the practical application...
Keywords/Search Tags:Birth Defects, Variable Selection, Lasso, Group Lasso, Logistic Regression Model
PDF Full Text Request
Related items