Font Size: a A A

Genetic Association Studies For Complex Traits Of Crops And Linear-model-based MDR Method Developing

Posted on:2017-04-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y ZhouFull Text:PDF
GTID:1220330485462424Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The genetic dissection of complex traits or diseases is particularly important for efficient crop molecular breeding and human precision medicine. Linkage analysis and association studies are two primary approaches for gene mapping. Linkage analysis, which seeks the trait loci through their co-segregation with tagged polymorphic markers within families, has been widely used in the last two decades. More recently, with the advance of high-throughput sequencing technologies, massive amounts of genomic data, such as single nucleotide polymorphisms (SNPs), are emerging, which stimulates the enthusiasm of genome-wide association studies (GWAS) for both human diseases and crop complex traits. Though it has yielded promising results, it still confronts the "missing heritability" problem, which describes the phenomenon that the identified SNPs by GWAS account for only a certain proportion of the expected heritability upper-bounded by previous familial studies. One possible explanation for this discrepancy is the existence of gene-gene interactions and gene-environment interactions, while current GWAS often consider a single-locus model that ignores these interactions.In the present study, attempts have been made to address this challenge by exploring and designing some new appropriate association approaches or strategies for complex trait analysis for GWAS data. These approaches were demonstrated by three GWASs conducted in cotton, rice, and tobacco in this dissertation. In addition, a novel method called linear multifactor dimensionality reduction (LMDR) that used linear model to reconstruct the kernel algorithm of the MDR (multifactor dimensionality reduction, a popular computational strategy for detecting nonlinear patterns of genomic interactions in genetic association studies) was also developed. The main contents of this dissertation are summarized as follows.Chapter 1 reviewed the basic concepts and classical statistical methods adopted in GWAS, and summarized major challenges faced in GWAS such as population structure, multiple testing, as well as missing heritability. Then, some proposals were put forward attempting to address the above problems.In chapter 2, using identified ~0.4 million SNPs, a genome-wide association study for 4 fiber yield traits of Upland cotton was carried out in a natural population consisting of 316 cultivars. Cotton is an often cross-pollinated crop and there were some heterozygous genotypes found in the SNP data set, but the commonly used simple additive model cannot handle these heterozygous genotypes. Thus, we used a saturated model that includes additive, dominance, epistasis and environment interaction effects simultaneously to explore the genetic basis of yield traits of Upland cotton. Although the proportion of heterozygotes is low (~0.07), the dominance-related effects were detected as the major components of total heritability for 4 traits. It was revealed that the small number of heterozygous genotypes had a large influence on the phenotypic variation and this study provided insights for heterozygote advantage for cotton yield traits at the molecular level. Additionally, through comparing with the corresponding reduced model, this study provided a possible explanation for the missing heritability problem in cotton.In chapter 3, multiple genome-wide association strategies were taken to investigate the specific genetic basis of super hybrid rice Xieyou9308 for its high yield, where a RIL population derived from this hybrid was constructed and sequenced for subsequent analyses. Firstly, the feasibility of GWAS in this controlled experimental population was examined and discussed because most GWAS are usually based on natural population. Then for plant height and heading date, we simultaneously adopted three association strategies (including the traditional hypothesis-free genome-wide association and its two complementary hypothesis-engaged ones, QTL-based association and gene-based association) for holistic analysis by incorporating the prior knowledge from this special population such as previous reported QTLs, annotated genes into our association mapping. As a result, some common loci were identified through comparative analysis, which could be preferred candidates for further research. This study demonstrated that association mapping in experiment population could complement or enhance previous QTL mapping through multiple comparative analysis, and thus it could provide more precise QTL information for subsequent gene cloning and marker-assisted selection.In chapter 4, as exemplified by chromium content and total sugar in tobacco leaf, the associations between four omics data (i.e. genomics, transcriptomics, proteomics and metabolomics) and complex phenotypes were explored to identify corresponding quantitative trait associated SNPs (QTSs), quantitative trait associated transcripts (QTTs), quantitative trait associated proteins (QTPs) and quantitative trait associated metabolites (QTMs) along with the available genome, transcript, protein and metabolite profiles. These intermediate molecular phenotypes (or endophenotypes) help elucidate genotypic variation that underlies complex traits.In the last chapter, we used the linear model framework to reconstruct the kernel algorithm of the MDR. MDR itself is a machine-learning kernel aiming at detecting genomic or environmental interactions. However, it lacks of clear statistical properties such as p-value, which is often evaluated via permutation or the central-limited theory in current MDR methods. To overcome these limitations, we developed LMDR. Through simulation study, we found that LMDR not only provides reasonable statistical properties such as p-value (which is also more computationally efficient with no need for permutation), but also is a much flexible framework for meta-analysis and conditional analysis. Additionally, LMDR is easy to implement and compatible to most extensions of MDR.
Keywords/Search Tags:complex traits, association analysis, novel association strategies, mixed linear model approach, epistasis, Omics, multifactor dimensionality reduction
PDF Full Text Request
Related items