Font Size: a A A

Genetic Variation Mining And Analysis Of Genomic Data For Large Samples And Multiple Animals

Posted on:2024-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:G H JiangFull Text:PDF
GTID:2530307160476444Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of high-throughput sequencing technology and the significant reduction of sequencing costs,animal genomic data has grown rapidly,and animal genetic breeding and trait improvement have fully entered the genome era.At present,a large number of studies have identified large-scale population genetic variation in diverse human races and developed corresponding data sharing platforms.However,compared with humans,the genomics data mining and database construction of other animals are still lagging behind,especially the multi-species and large-sample comprehensive genetic variation database is still very limited,which greatly limits the development of animal genomics and genetics research.To promote the mining and reuse of animal genetic data,this research systematically collected and sorted out the whole genome and genotype data of 5,572 samples of 20 animals,systematically conducted the reidentification of genetic variation,construction of imputation reference panel,linkage disequilibrium analysis,and constructed the genetic variation comprehensive database(Animal-SNPAtlas)for multiple animals.And on this basis,we conducted exploratory research on multi-nucleotide variants(MNVs).The main results are as follows:Single nucleotide polymorphisms(SNPs),which are mutations in a single nucleotide at the genome level,are the most common type of genetic variation and are widely used in genetic research.In this study,whole genome sequencing(WGS)data or genotype data of5,572 samples from 20 animals were collected through systematic literature search and database search,and about 515 million high-quality SNPs were obtained through systematic SNP reidentification and variation filtering,and functional annotation was performed on these SNPs.On this basis,we constructed high-density genotype imputation reference panels,which could effectively improve SNP density and maintain high accuracy through data simulation analysis.In addition,linkage disequilibrium analysis is also a frequently used method in genetic variation research,but an easy-to-use public query platform does not yet exist,so we further performed genome-wide linkage disequilibrium calculation.Finally,we developed a comprehensive SNP database for multiple animals based on the above data(http://gong_lab.hzau.edu.cn/Animal_SNPAtlas/).The database provides abundant functions,including SNP function annotation and visualization,online genotype imputation,linkage disequilibrium calculation and visualization,and reference panel download.MNVs are defined as clusters of two or more nearby variants existing on the same haplotype in an individual.Recent studies have found that there are a large number of MNVs in the human genome and MNVs exhibit higher harmfulness than single nucleotide variants(SNVs).However,there is currently no systematic analysis of MNVs specifically targeting animals.Therefore,through developing the pipeline,this study systematically identified and annotated MNVs across the whole genome in common economic animals.In total,we identified about 21 million MNVs in 7 economic animals,and we found that these MNVs fell in a low proportion of the gene coding region,but 14,470 MNVs still fell within the same codon of the gene.Unlike the effects annotated by SNVs,MNVs might produce new functional effects by altering the coding of amino acids.In addition,this study also explored the distance distribution of MNVs and found that MNVs with a distance of 1were the most common types.On this basis,this study further explored the mutation patterns of adjacent MNVs and speculated potential mutation mechanisms.TG->CA were found to be the most frequent mutation patterns of MNVs,which might be formed by the combination of two high-frequency single nucleotide mutation events at the Cp G island.The relevant results provided an important theoretical basis for a comprehensive understanding of the function and mechanism of MNVs.To sum up,by systematically mining and analyzing animal genomic data,this study provides important fundamental resource for the animal genomics,genetics and breeding community,and MNVs analysis provided new ideas and methods for animal genetic research.
Keywords/Search Tags:High-throughput sequencing, genomics, genetic variation, single nucleotide polymorphisms, multi-nucleotide variants, genetic improvement
PDF Full Text Request
Related items