Font Size: a A A

Construction,Analysis And Database Construction Of Animal And Plant Genetic Variation Reference Panels

Posted on:2023-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y J GaoFull Text:PDF
GTID:2530306842968679Subject:Bioinformatics
Abstract/Summary:
Single nucleotide polymorphisms(Single nucleotide polymorphisms,SNPs)are widely used in genome-wide association study(GWAS)and play vital roles in genetic breeding and population genetics of animals and plants.The cost of whole-genome sequencing(WGS)and genotyping-by-sequencing(GBS)has been declining in recent years with the development of sequencing technologies,and massive amounts of population whole-genome sequencing data from various animals and plants has been continuously released,which provides a powerful help for humans to improve the varieties breeding,trait screening,and elucidating the mechanisms of phenotypic variation in animals and plants.However,it is still very expensive to perform whole genome sequencing on large-scale populations.Researches have shown that genotype imputation with reference panels can effectively increase SNP density,reduce genotype identification costs,improve the power of association studies,and facilitate identifying causal variants for complex traits.In humans,a large number of population genotype reference panels have been constructed,and several resources and tools have been developed for pre/post GWAS analyses.However,high-quality reference panels and corresponding supporting tools are still lacking for animals and plants,which greatly limits the genetic studies of animals and plants.To bridge this gap,this study integrated animals and plants genotype data or whole-genome sequencing data from large publicly available databases,and developed a corresponding analysis pipeline to systematically identify and filter SNPs.The filtered SNPs were applied to construct high-quality reference panels and perform further analyses.The main results are as follows:In this study,the Plant-Impute DB database was first constructed.In the PlantImpute DB,we collected publicly available genome sequencing or genotype data from widely studied plant databases;developed a specific processing pipeline to systematically identify SNPs;filtered samples and SNPs;used the remained samples and SNPs to construct reference panels;evaluated the haplotype robustness and imputation accuracies of the reference panels.Finally,high-quality reference panels of 12 plant species were constructed,including 69.9 million SNPs from 34,244 samples.By integrating genotype data and related analysis results,the Plant-ImputeDB database was constructed to provide network services.Plant-Impute DB provided online genotype imputation,SNPs and haplotype blocks search,accessions information browsing,and data download.PlantImpute DB also allowed user submissions of different types of genomic variations,and provided free and open access to all publicly available data.The Plant-Impute DB database can be accessed for data acquisition by http://gong_lab.hzau.edu.cn/Plant_impute DB/.Secondly,the Animal-SNPAtlas database was also constructed in this study.In the Animal-SNPAtlas,we collected genotype data or whole-genome sequencing data from widely studied animal databases;developed systematically identifying and processing workflow of SNPs;filtered SNPs and samples,and constructed high-quality reference panels;evaluated the imputation accuracies of the reference panels;performed functional annotation and linkage disequilibrium(LD)calculations for SNPs in the reference panels.By integrating genotype data and related analysis results,the Animal-SNPAtlas database was constructed to provide network services.Animal-SNPAtlas is a comprehensive animal genetic variation database containing ~685 million SNPs from 2,977 samples across 24 animal species.Animal-SNPAtlas supports five main functions.1)Online genotype imputation based on high-quality reference panels;2)Genetic variation functional annotation search;3)Genome-wide LD information search and LD visualization;4)SNPs visualization based on genome browser;5)Downloading high-quality reference panels.The Animal-SNPAtlas database can be accessed for data acquisition by http://gong_lab.hzau.edu.cn/Animal_SNPAtlas/.In conclusion,Plant-Impute DB and Animal-SNPAtlas,as important resources in the field of animal and plant genetics,are expected to promote genetic researches including genome selection and genetic improvement in animals and plants.In the future,the databases will be updated by incorporating more reference panels for new species,increasing the number of samples for existing species,and adding new functions and imputation softwares,which improves data quality and functionality of the databases.We believe that the Animal-SNPAtlas and Plant-Impute DB will be rich and valuable resources in the field of animal and plant genetics.
Keywords/Search Tags:Single-nucleotide polymorphisms, reference panels, genotype imputation, variation annotation, linkage disequilibrium, visualization, database
Related items