Font Size: a A A

A Comparative Study Of Five Association Tests Based On CpG Set For Epigenome-wide Association Studies

Posted on:2017-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhangFull Text:PDF
GTID:2284330485465781Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Some of the missing heritability in tumor and some other complex diseases might be explained by epigenetic variation, especially DNA methylation in CpG sites. High throughput technologies enable simultaneous epigenetic profiling of DNA methylation at hundreds of thousands of CpGs across the genome. Therefore the demand for statistical methods for high-dimensional DNA methylation data has become more and more imminent. Liu et al. (2014) have shown that the clustering of correlated DNA methylation at CpGs was similar to that of linkage-disequilibrium (LD) correlation in genetic SNP variation. They denote these sets of correlated CpGs as "GeMes" for genetically controlled methylation clusters. To reduce the false positive rate caused by multiple testing and utilize information contained among multiloci, we propose CpG set based analysis to analyze DNA methylation data and compare the performance of these methods with traditional ones.Five CpG set analysis approaches:principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), sequence kernel association test (SKAT) and sliced inverse regression (SIR). Traditional methods:Hotelling’s T2 test (T-square) and t-test using Bonferroni correction. In this research, we compare the type I error rate and test power of these seven methods through simulations. Scenarios are set in different correlation coefficients, numbers of causal CpGs, coefficients of disease model and so on. Simulated CpG sets are generated based on virtual datasets and real methylation datasets. Finally, we analyzed two real DNA methylation datasets:a rheumatoid arthritis (RA) dataset and a colorectal cancer (CRC) dataset.The simulation results show that previous six methods can control the type I error at the significant level, while t-test using Bonferroni correction is a little more conservative. As correlation increase, the powers of PC A, SPCA, KPCA and SKAT are on the rise, while there are no in-or decrease trend of SIR, T-square and t-test. When correlation is strong(r>0.4), SKAT and SPCA have higher power than t-test. Powers of SIR and T-square are lower than other methods. The power of KPCA has a big rise as r changes from 0.6 to 0.8, while the power of PCA remains a minor increase tendency. The results of real data application show that SPCA performs the best. SKAT is slightly superior to SPCA.SPCA and SKAT can combine multiloci information and provide better simulated power among the seven approaches. We suggest that SPCA and SKAT can be used for CpG set analysis and screening the association across the entire epigenome.
Keywords/Search Tags:epigenome-wide association study, DNA methylation, CpG set analysis
PDF Full Text Request
Related items