Font Size: a A A

An Exploration On The Application Of Association Rules In Selecting SNP Associated With Cancer

Posted on:2011-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y BiFull Text:PDF
GTID:2154360305498252Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BACKGROUNDThe association research between single nucleotide polymorphism (SNP) and cancer prevalence now focus on how to filter out these SNP that really associated with cancer. Since different screening methods were applied, many related studies showed inconsistent results. Association rule mining, an important data mining method, is expected to be able to filter out potential association between two items from large databases. This research attempts to apply association rules in the screening of SNP associated with cancer.OBJECTIVEThis research aims at applying association rules in the screening of SNP associated with cancer, makes effort to decrease the false positive rate and false negative rate, and further explores the application of association rule in the screening of SNP-SNP interaction.METHODSThis research evaluated the applicability of association rules in screening SNP associated with cancer through simulation. The technological route is as follows:1) In the framework of case-control study, we built a Logistic simulation model, set cancer prevalence as the dependent variable, SNP and SNP-SNP interaction as the independent variables, and generated random samples.2) We adopted Bootstrap re-sampling method to generate Bootstrap sub-samples, applied association rules analysis on Bootstrap sub-samples, and then screened SNP associated with cancer with the result of stepwise Logistic regression. 3) We applied association rules analysis on the simulation samples, dismantled the back items of rules, and then screened SNP-SNP interaction associated with cancer with the result of Logistic regression by score criterion.4) We analyzed a real SNP and cancer prevalence data to validate our proposed method.RESULTSCombineing the association rules with the stepwise Logistic regression, the proposed method was able to reduce the false negative rate of screening, while it didn't raise the false positive rate. We applied the association rules combined with Logistic regression by score criterion to screening the SNP-SNP interactions, the result showed that this method 38.2% of the top model include the interaction items of original model.CONCLUSIONUsing support, confidence, and p value ofχ2 test to screen association rules roughly, and then combining with Bootstrap re-sampling and stepwise Logistic regression, we can filter out the SNP associated with cancer, with lower false negative rate than traditional methods. Using support, confidence, and p value ofχ2 test to screen association rules roughly, dismantling the back items of the rules, and then combining the Logistic regression by score criterion, we proposed a method to filter out the SNP-SNP interaction associated with cancer, which provides a valuable start for further study of SNP-SNP selection.
Keywords/Search Tags:Single nucleotide polymorphism, association rules, cancer, screening
PDF Full Text Request
Related items