Font Size: a A A

Regulation Of Gene Expression And EQTL Analysis Of Structural Variation During Seed Development In Brassica Napus

Posted on:2023-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:D X LiuFull Text:PDF
GTID:1523306842964249Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Brassica napus is one of the important oilseed crop in China,and its planting area ranks the first in the world all the year.Rapeseed oil is the traditional edible oil in China,accounting for more than 50% of the total domestic edible oil.B.napus is a allotetraploid species originated from a spontaneous hybridization between Brassica rapa and Brassica oleracea about 7,500 years ago.Due to frequent communication between genomes,redundant genes loss after doubling,and sub-functionalization,massive variations in genome and gene expression have been accumulated to affect the phenotype of B.napus.With the development of sequencing technology,a large number of genomic and transcriptome data have been accumulated in B.napus,and several variations and candidate genes affecting important phenotypes have been identified.However,a complete set of gene expression profiles and systematic analysis of how sequence variations affect gene expression and metabolic pathways in B.napus are still lacking.In this study,273 high temporal resolution transcriptomes covering the whole growth stage and whole tissue of B.napus were used to built a transcriptome database Bn TIR.The 26 time-course high-resolution transcriptomes in developing seed were used to analyze the gene expression regulation network of B.napus during seed development,and the genes related to oil synthesis and phenylpropane metabolism were excavated.Develop the large InDels detection tool Indel Ensembler and evaluate the performance of the tool using different data sets and in different species;The large InDels in 505 samples of B.napus were used to identified eQTL(expressed quantitative trait loci)and their regulatory genes(e Gene).The main results are as follows:1.B.napus transcriptome resource collection and database establishmentIn this study,2,653 transcriptome of B.napus were collected and sequenced,including 110 chemical transcripts,203 biological transcripts,701 abiotic transcripts and 1,639 normal tissue transcripts.Transcriptome sequencing was performed on 273 samples from the whole growth stage of B.napus,and the transcriptome database Bn TIR was built.The database contains 21,506 homologies between Arabidopsis and B.napus,provides the information of 5,955 transcription factors(TF)from 58 transcription factor families,and sequence extraction function from 11 B.napus genomes and 3 ancestral diploid genomes.Provides the retrieval function of the transcription factor regulatory network including 1.56 million edges(Gene-Gene)coexpression network and 1.5 million edges(TF-Gene).In addition,tools such as e FP,gene ID conversion,sequence alignment,genome browser and heat mapping are available.The construction of this database can help researchers to quickly excavate candidate genes,analyze the expression characteristics of target genes,and provide the corresponding basis and support for the study of gene function.The database has been widely used since its launch in October 2020,with more than 43,800 visits from 39 different countries and regions.2.Regulation of gene expression and co-expression network construct during seed development in B.napusIn order to systematically analysis the regulation of gene expression during seed development and explore key genes,a weighted gene co-expression network(WGCNA)was constructed using the transcriptome at 26 time points during seed development.The seed development stage was clearly clustered into five distinct groups,corresponding to embryonic development,seed filling(rapid accumulation period and stable period),preparatory desiccation phase and desiccation respectively,and each stage was identified as a functional co-expression module.35 hub genes were detected that overlapped with the TWAS-significant genes,including TT1,TT5,TT19,BAN and other procyanidins biosynthesis genes.The regulatory networks of phenylpropane metabolism and acyl lipid synthesis were constructed respectively.It was found that the expression of Bna A03.DOF4.4(Bna A03G0459300ZS)was positively correlated with oil content(SOC)and negatively correlated with seed coat content(SCC).In addition,the expression level of candidate gene Bna A08.ALCA-3(Bna A08G0294900ZS)was negatively correlated with SOC but positively correlated with SCC.The expression levels of Bna C07.MORC7(Bna C07G0460800ZS)and Bna C01.PGI1(Bna C01G0181100ZS)were positively correlated with SOC and negatively correlated with SCC.In this study,the regulatory network of gene expression at different stages of seed development was analyzed,and genes co-expressed with phenylpropane metabolic pathway and fatty acid synthesis were identified,providing valuable reference for understanding the regulatory mechanism of substance synthesis and carbon partitioning during seed development.3.Development and performance evaluation of large InDels detection toolIn this study,we integrated four existing methods to develop Indel Ensembler,a large InDels detection tool.The accuracy of Indel Ensembler was evaluated in Arabidopsis,soybean and B.napus using different depth sequencing data,and the performance was compared with GRIDSS and Manta.The results showed that the performance of Indel Ensembler is better than or equal to GRIDSS and Manta at different sequencing depths.Compared with the existing method Ath CNV,InDels identified by Indel Ensembler is more complete and accurate.We applied Indel Ensembler to call large InDels in 1,047 Arabidopsis,the final callset consits34,093 deletions(DEL),12,913 tandem duplications(DUP),and 9,773 insertions(INS).Large InDels occur more often in transposable element genes and pseudogenes and intergenic regions of Arabidopsis;but depleted in protein genes and genic regions.Genome-wide Association Studies(GWAS)based on large InDels indeed found two significant loci for flowering time under 16 on chromosome 1 and chromosome 4 in Arabidopsis.The high accurate tool developed in this study provides an important tool for uncovering structural variations that affect the phenotypes of Arabidopsis and other species.4.Identification of large InDels and eQTL analysis in B.napusIn order to explore how large InDels affect phenotypes at the genome-wide level by affecting gene expression,505 resequencing data of B.napus were used to identify large InDels,and eQTL analysis was performed based on gene expression profiles at different stages of seed development.The results showed that a total of 119,948 large InDels were identified in the ZS11 genome,including 22,417 new variants.DEL and DUP were enriched near the centromere,suggesting that the centromere was the main source of variation in B.napus.eQTL results showed that 9,465 eQTLs were significantly associated with the expression of at least one gene,and 9,847 e Genes were regulated by eQTL.The expression variation of cis-eQTL was significantly higher than that of trans-eQTL.54.1% of e Gene expression was regulated by a single eQTL,and65.1% of eQTL was regulated by a single gene,indicating that most gene expression variation occurred under relatively simple genetic control.Large InDels identified in this study provided important genetic resources for the genetic analysis of complex traits in B.napus,and a large number of eQTL and regulated e Gene provided important support for the construction of regulation network between genes and the analysis of gene regulation mechanism in B.napus.
Keywords/Search Tags:B. napus, Transcriptome sequencing, Gene regulation network, Large InDels, eQTL
PDF Full Text Request
Related items