Font Size: a A A

Identification And Evolution Analysis Of Copy Number Variation Of Plant Fatty Acid Metabolism Genes

Posted on:2020-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:B Y TongFull Text:PDF
GTID:2370330623961015Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Gene gain and loss are the two major forms of copy number variation.Gene copy number changes frequently during the evolution of species.Gene copy number variation causes changes in gene family size,and changes in gene family may be the result of species adapting to the environment.This result may lead to the species in morphology,biology.The method of comparing genomes can help us explore the law of change and explain the reasons for these changes.By comparing genomics methods,we compare the genomes of different species and identify gene families with gene copy number variations,which helps us determine the relationship between gene copy number variation and species adaptive evolution.Therefore,the following ten plant genomes were selected for correlation analysis: Arabidopsis thaliana,Brassica napus,Gossypium arboreum,Gossypium hirsutum,Arachis hypogaea,Sesame,Zea mays,Glycine_max,Olea europaea,Elaeis guineensis.Using genome-wide sequencing data and comparison algorithms,can help us identify gene family and gene copy number variations in plants.Since larger genomes may contain multiple paralogs and sequence information is often incomplete,we use genome-wide comparison methods to identify all gene families in plants,and use BLAST to identify homologous genes between species,generate homologous genes into the same gene family by Markov clustering algorithm used by OrthoMCL.The gene copy number of each species in the gene family is compared,and the copy number variation gene is selected.A total of 96,212 gene families were obtained,containing 286,462 genes.Among them,there are 17 single-copy homologous gene families,33,409 multi-copy homologous gene families,and 62,786 gene-specific gene families.Of the 33,426 gene families,5,840 gene families with a total of 42,348 genes,were found to exist in their most recent common ancestor,and the average number of genes is 7.25 in each gene family.The largest gene family contains 104 genes from all ten species.Of the 5840 gene families,4890 gene families are completely lost in at least one species,and 950 gene families with copies in all ten species are most likely to represent the core proteome of oil plants.By comparing genomic studies,it was found that the number of copies of genes involved in different cells and developmental processes among organisms varied greatly,even showing the loss of the entire gene family from various lineages or the emergence of new gene families.Although these studies began to provide some evidence for the molecular basis of phenotypic evolution,the time considered was too long to provide evidence for changes in a single trait.The apparent consensus of the total number of genes between species masks the rapid update of individual gene increases and deletions.This phenomenon is likely to play an important role in shaping the morphological,physiological and metabolic differences between species.In order to obtain a comprehensive and accurate evolutionary data of copy number variation,we applied the probability framework developed by Hahn et al.,assuming that all genes have equal increase(birth)and missing(death)probability ,we use the maximum expectation algorithm(EM)to learn from the data.The experimental results show that the average gene turnover rate of the relevant plants is = 0.0034;the estimated rate of plant gene increase and deletion means that about 92 new repeats and 92 new losses are fixed every million years in a single genome(0.0034 gain or loss / gene / million years × 27,000 genes).We use gene ontology analysis to classify and functionalize species-specific gene families,and to understand the molecular functions and product characteristics of these genes,and to perform enrichment analysis on them.Most of the genes are functionally enriched and contain DNA binding,protein binding,regulation of metabolic process,protein modification,fatty acid biosynthesis and other processes.This study provides a new idea for a comprehensive understanding of the relationship between gene copy number variation and evolution.The filtrating of specific genes provides a basis for studying the evolution of single traits in plants.
Keywords/Search Tags:Comparative genomic, Copy number variation, Markov Cluster algorithm, Evolutionary analysis, Gene ontology
PDF Full Text Request
Related items