| Objective:Genome wide association studies (GWAS) are based on single nucleotide polymorphisms(SNPs), which had been widely used in genetic association studies to hunt the susceptibility genesfor the complex diseases. Since single SNP association analysis has some shortcomings and limits,more and more studies are focusing on developing gene based association analyses. We developeda new Linkage disequilibrium (LD) based method and used Monte Carlo method to evaluate it andother popular gene based methods to explore the advantages and disadvantages of these methods.Finally, we applied the new method to real GWAS data of coronary artery disease (CAD) to detectthe susceptibility modules and genes for CAD, which may provide novel clues for the process ofCAD pathogenesis.Methods:1. Use Monte Carlo method to simulate gene-based genetic association data. Firstly, weassumed the genotyped SNPs are continuous variables and the data got close to the multivariatenormal distribution. Thus the continuous simulation data were generated according to Linkagedisequilibrium (LD) coefficient matrix. Secondly, transform continuous simulation data intodiscrete data according to genotype frequencies of case and control.2. Use Monte Carlo method to evaluate gene based association methods. We developed a newassociation analysis method on gene-level based on LD (LD-Fisher). First, haplotype analysisalgorithm was used to analyze genetic LD structure in order to get relatively independent haplotypeblocks and the most significant SNPs in every haplotype block, and then used Fisher combinationmethod to obtain gene-level analysis results. Monte Carlo methods were used to generatedsimulated genetic data according to the parameters including minor allele frequency, correlationcoefficients between SNPs and the diseaseã€the number of SNPsã€the number of haplotypeblocksã€the number of susceptibility SNPsã€LD structure of SNPs. Then gene based associationmethods including LD-Fisher were evaluated upon these simulated data.3. Apply gene based genetic association analysis methods to real GWAS data for Coronaryartery disease (CAD) and mine susceptibility network modules and genes. Based on gene-levelassociation analysis of CAD GWAS data, we constructed CAD related biological informationnetwork and analyzed the network modules and genes to hunt for susceptibility genes for CAD.Results: 1. We simulated gene based genetic association data using Monte Carlo method on SASplatform. The simulated data showed that minor allele frequencies and LD structures of simulateddata were highly close to the preset parameters.2. Among these gene based genetic association analysis methods, principle componentanalysis-logistic regression (PCA-logistic) and our developed LD-Fisher performed best.Regardless of the number of haplotype blocks, the power of PCA always be close to1with ahigher threshold value of cumulative contribution rate (95%, PCA95). However the result was notso good when the threshold value of cumulative contribution rate (85%, PCA85) was reduced. OurLD-Fisher method overcame the shortcomings of Fisher combination method and the poweralways be close to1and a little smaller than PCA95in1haplotype block scenario. While inmultiple haplotype blocks scenario, its power was close to PCA95.3. Four susceptibility modules for CAD were discovered by gene based analysis and networkanalysis, which includes a module consisted of15functional inter-connected sub-modules. Wefound that MAPK10(OR=32.5,P3.51011) and COL4A2(OR=2.7,P=2.81010)among the top-scored, focal adhesion pathway related module were the most significant genes. Thesignificances of the two genes were further validated by other two gene-based association tests andanother independent GWAS dataset.Conclusion:1. Our developed gene based association analysis method using Monte Carlo simulation cangenerate simulation data met preset parameters, and the simulation data could be used to evaluateand analysis gene based association analysis methods.2. Our developed gene based association analysis method (LD-Fisher) and PCA-logistic havea higher power under conditions of multiple parameters. Our LD-Fisher can be used to analyzegene based association of complex diseases.3. We found that gene based association analysis methods and network analysis method couldimprove association analysis just based on SNPs, it provided clues for the research of susceptibilityand the clarification of complex diseases pathogenesis. |