Penalized Methods And The Application In Post-Genome-wide Association Study | | Posted on:2015-05-02 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:J W Gou | Full Text:PDF | | GTID:1224330467459568 | Subject:Epidemiology and Health Statistics | | Abstract/Summary: | PDF Full Text Request | | With the successful completion of the Human Genome Project, the progress of the HapMap Project, and the establishment and improvement of the high-throughput technology platform, genome-wide association studies (GWAS) identified a large number of genetic variants associated with complex diseases, and built large-scale genomic databases of single nucleotide polymorphisms (SNPs).In-depth analysis of GWAS data to to identify novel genetic susceptibility loci and to find the missing heritability of complex diseases, has become one of the hot topics in bioinformatics and biostatistics. It is known as the post-GWAS analysis. Accordingly, some new research strategies have been applied to the post-GWAS studies, including the analysis of gene-gene interactions in high-dimensional genomic data, gene-based or SNP-set analysis for GWAS, rare variant association studies via next-generation sequencing. In these studies, however, we need to analyse the high-dimensional datasets. It is not trivial and requires specialized skills. If the genetic variant is supposed to be the dependent variable in regression analysis, the number of variables is quite more than the number of samples and there exists large number of noisy or redundant variables. How to exclude the non-informative variable from the association analysis is the topic of variable selection.The penalized regression methods have proven to be successful in dealing with high dimensional variable selection. The method which adds a penalty to the loss function to shrink and squelch noisy variable, can be used to variable selection of high-dimensional data. There are a large number of variable selection methods, and each has both advantages and disadvantages. However, few studies have made statistical evaluations of these methods applied to genome-wide association studies. Based on this, we review and develope some penalized regression methods in post-GWAS, and implement fast and efficient algorithms for real-world problems to find the "missing heritability" and provide guidance for practical applications. The full text is structured as follows:Part â… , Modified penalized regression method was suggested and applied in detecting interactions in large-scale genomic study. We proposed the stability selection procedure in penalized regression to detect gene-gene interactions. As the selection of tuning parameters in penalized regression by using cross-penalized tends to be over-fitting, we built a relationship between the false discovery rate (FDR) and the threshold of stability selection. The proposed method could control FDR. We compared it with the traditional penalized methods by intensive simulation. The simulation revealed higher power and lower FDR for the the new method. Finally, we applied the proposed stability SCAD method to analyze a previously published GWAS dataset of lung cancer.Part â…¡, The systematic assessment of group penalized methods applied in the SNPs set based association analysis. This study evaluated four kinds of group penalized methods in the gene-based association studies. We simulated the case-control genome-wide association study based on real genotype data structure. We compared the performance of four kinds of group penalized methods in selecting causal genes under the following different scenarios:different number of causal genes, different numbers of SNPs in each causal gene, different numbers of causal SNPs in each causal gene, different directions of the effects of causal SNPs, etc. And we analyzed a real GWAS data set consist of22pathways associated with lung cancer by using group penalized methods.Part â…¢, We proposed the weighted bi-level penalized methods and applied into the analysis of rare variants. As rare variants could possibly play an important role in the disease process, we propose a weighted group exponential LASSO penalty. By weighting the rare variants, the less allele frequency of the SNP is, the less penalized effect of the SNP is. Through simulating rare variants data, we compared the new method with other methods in summary. Finally, the analysis of the GAW17has once again proven the superiority of the weighted method.Part â…£, The false discovery rate (FDR) was introduced into penalized regression. When penalized regression variable selection method was applied to high-dimensional data, it usually detects too many false positives. To solve the problem, we introduced the FDR of multiple comparisons into the penalized regression. By using the Karush-Kuhn-Tucker (KKT) conditions for the optimal solution of penalized methods, we established the corresponding relationship between the regression analysis and hypothesis testing. Thus, it shows that bringing the FDR into penalized regression is natural and reasonable, and it makes easier to explain the variable whether is statistically significant. Through the simulation based on virtual LD structure, the new penalized method with controlling FDR was compared with the original penalized methods. Moreover, the improved method is applied to the lung cancer GWAS data. | | Keywords/Search Tags: | Post-Genome-wide association study, Gene-gene interaction, SNPs set, Rare variant, LASSO, SCAD, MCP, Group penalized method, Bi-level penalizedregressionmethod, Coordinate descent, BIC | PDF Full Text Request | Related items |
| |
|