Font Size: a A A

Development Of A Rice Genomic Variation Database And Candidate Gene Prioritization Platform For Genome-wide Association Studies

Posted on:2020-12-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:1360330572482962Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The development of sequencing technology and the decrease in sequencing prices have made it easier for us to obtain sequence variations on the genome of the population.The accumulation of genomic resequencing data is bringing convenience,and at the same time,gives researchers more challenges for using it.So,lots of sequence variation databases came out,and which makes variation data more easily to obtain and use.After we get the variation data and the phenotype data,we can use the genomewide association study(GWAS)to locate regulatory sites.However,how to raise the results to the gene level after GWAS still has many problems,and if we can't find the functional gene that means we can't have a clearer understanding of regulatory mechanisms of traits.Rice(Oryza sativa L.),as an important food crop and grass model plant,has better research conditions and strong application value.There are a lot of GWAS approaches already published in rice,but most of the verification of significant SNPs in GWAS are based on cloned genes,and there is a little research of gene cloning based on GWAS.The main reason is that,rice has a larger LD region,which makes it difficult to determine the target gene,so that,it is necessary to train and construct a candidate gene prioritization method.The lack of rice candidate gene prioritization methods may stifle the development of 'downstream' research.Therefore,develop a candidate gene prioritization method based on multiple-omics data is essential for the prioritization and cloning of genes by GWAS.As the most important oil crop,rapeseed(Brassica napus L.)has a more complicated genome than rice,and it is a heterotetraploid,having A and C two subgenomes,and functional genomics studies in rapeseed are still lacking.Based on the platform of GWAS candidate gene prioritization in rice,we built the GWAS platform and the GWAS candidate gene prioritization platform,which can effectively extend the applicability of the candidate gene prioritization platform.In this study,we use rice and rapeseed as a model,identified the variations in the genome sequence,and combined the phenotypic traits,we developed the GWAS candidate gene identification platform.Through the construction of the GWAS candidate gene prioritization platform for these two species,it is hoped that it can be extended to other species.The main results of this study are summarized as follows:1.Construction of rice variation database.Based on variation data of rice genome,we construct the rice genomic variation database,RiceVarMap v1.0(http://ricevarmap.ncpgr.cn/v1).We identified 6,551,358 single nucleotide polymorphisms(SNPs)and 1,214,627 insertions/deletions(INDELs)identified from sequencing data of 1,479 rice accessions.And with an overall missing data rate below 0.42% and an estimated accuracy greater than 99%.And we designed abundant query functions and multiple practical tools.Subsequently,we contructed RiceVarMap v2.0(http://ricevarmap.ncpgr.cn/v2)based on more rice accessions.Totally,we get 17,397,026 genomic variations(including 14,541,446 SNPs and 2,855,580 small INDELs)from sequencing data of 4,503 rice accessions.To offer reliable and detailed functional information for genomic variations,we annotated variations on different aspects based on multi-omic datasets.(i)snpEff,CooVar,and PolyPhen-2 were adapted to quantitatively evaluate missense mutation effects in coding regions.(ii)Chromatin accessibility data were collected and integrated to indicate possible risks of variations in non-coding regions.(iii)Variations significantly associated with certain phenotypes in GWAS were also announced.In addition,we redesigned frames and interfaces of the website and introduced better visualization tools,making RiceVarMap v2.0 more userfriendly for researchers.2.Construction of GWAS candidate gene prioritization platform.Constructing a comprehensive evaluation function for GWAS significant region using multi-omics data.The evaluation function can mainly divide into 4 parts: 1)Functional assessment of gene.We use GO annotation,conserved domains of genes and the information of differentially expressed genes to construct functional gene-sets,and use support vector machine(SVM)method to score each gene for it related to target trait or not.2)Evaluation of effects of the variation within the gene.We give each gene a score based on a prediction of variation effect on protein sequence and GWAS P value for each variation within the gene.3)Evaluation of expression effects of the gene.Using TWAS results and eQTL results to comprehensively assess whether gene expression levels are associated with target traits or not.4)Evaluation of haplotype effect.Calculate the Pvalues for each candidate gene by using SKAT software.To identify candidate gene more efficient,we also built a series of visualization methods to display the details information in the candidate region.3.Application of GWAS candidate gene prioritization platform in rice.Based on the genomic variations in RiceVarMap v2.0 database and the phenotypic data collected previously,we using GWAS candidate gene prioritization platform to identify target gene for heading date and plant height.In an analysis of heading date,we have identified numerous cloned heading date regulatory genes precisely,such as Ghd7,Ehd1,OsBBX14,and OsMADS15.In an analysis of plant height,we also identified multiple cloned QTL,such as sd-1,Hd3 a,and Ghd7.4.Application of GWAS candidate gene prioritization platform in rapeseed.Based on 505 rapeseed accessions,we constructed GWAS platform firstly.And using GWAS candidate gene prioritization platform made previously,we made an analysis of GWAS candidate gene prioritization for fatty acid composition in rapeseed.We optimize and expand the platform based on available data and genome characteristics of rapeseed.Using the extreme phenotype k-mer searching method,we successfully identified the erucic acid regulatory gene FAE1 and the functional variation site.The GWAS candidate gene prioritization platform availability is fully illustrated by the application of multiple species and multiple phenotypes.Using this platform,we can reduce the number of candidate genes or even directly target the target gene in a candidate region.We hope that the establishment of these databases and platforms will help functional genomics research and genetic improvement of these crops.
Keywords/Search Tags:Oryza sativa, Brassica napus, genome, SNP, sequence variation, GWAS, database, candidate gene prioritization
PDF Full Text Request
Related items