Font Size: a A A

Parallelization And Optimization Of GPU Computation For Genetic Analysis Methods

Posted on:2015-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:F T ZhangFull Text:PDF
GTID:1220330431488965Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
This dissertation targets on parallelizing and optimizing computation for statistical genetics methods based on mixed linear models to detect loci and interactions underlying complex traits and diseases. The emergence of NGS (Next-Generation Sequencing) gives the scientists great chance to understand the mechanism of organisms, and also brings great challenge to analyze the huge data. The computer hardware update canot match the explosively increasing of NSG data. At present there are more than53million SNPs in dbSNP database. If we analyze the interactions of these SNPs by brute force, the number of SNP pairs can be peta-scale. Here, we developed association methods of CPU-GPU heterogeneous parallel computing base on omics data. Moreover, a self-adaptive load balancing method and a matrix compression method for coefficient matrix of mixed linear model were developed. We also developed an optimization method based on i-node. To shorten the developing cycle and improve the program performance, we developed a loop penetrating method.This dissertation consists of four chapters.Chapter1is an overall introduction of the history and the analysis methods of statistical genetics. We introduced the NSG technology and the chance and challenge it brings. We chiefly introduced the GPU parallel computing technology including the GPU working principle, the architecture of GPU, the executing mode of GPU thread and the memory mode of GPU. At the end of this chapter we presented mixed linear models and the methods to estimate the variance components, the fixed effects and the random effects.Chapter2mainly introduces the methods of classical statistical genetics and the strategies of parallelization and optimization. First we introduced the methods we exploited in classical statistical genetics for genetic models including agronomy model, animal model, seed mode and regional trail model etc. We emphasized the implementation of classical statistical genetic methods with GPU in this chapter. Because the General-purpose Graphics Processing Unit (GPGPU) is a new technology and its ability of automatic optimization is very limited, we should optimize the code manually. We introduced the instruction optimization, optimization of global memory space, optimization of global memory access and optimization of branches. Finally, we introduced the classical statistical genetic software QGAStation2.0implemented with a CUDA parallel computing platform and programming model.Chapter3introduces association methods, epistasis methods and gene by environment interaction methods based on regression models and the strategies of parallelization and optimization for GPU computing. Genotype values can be stored as integer but expression abundance can only be stored as float. To address this, we first introduced the method to analyze the genome data then the method for transcriptome, proteome and metabolome data. We illustrated how to parallelize the analyses and how to optimize the programs. At the end of this chapter, a novel GWAS software QTXNetwork based on omics data is introduced. This software can analyze bio-marker data and omics data with high performance accelerated by multi-GPUs.Chapter4summarizes the key technologies we exploited when parallelizing and optimizing the analyses. Firstly we briefly introduced the key technologies and methods we proposed and exploited. Secondly we detailed the key technologies in classical genetic analyses and omics data analyses. Finally we discussed how to make an excellent parallelization and optimization.
Keywords/Search Tags:Complex traits, GPU, GWAS, Epistasis, Genotype by environmentinteraction, Mixed linear model
PDF Full Text Request
Related items