Font Size: a A A

Evaluation Of Transcriptome Assembly And Construction Of Multi Omics Bioinformatics Platform In Tea Plant(Camellia Sinensis)

Posted on:2022-08-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:F D LiFull Text:PDF
GTID:1483306323987909Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Tea plant is the most important traditional cash crop in China,and the product made from its leaves is the world’s most popular non-alcoholic beverage after water,with great economic,health and cultural values.In recent years,with the progress of high-throughput transcriptome sequencing technology,the process of basic research on tea plant biology has been strongly advanced.However,most of the current assembly software and analysis protocols are designed based on transcriptome data from model plants,and there are many problems in their application to non-model plants such as tea,so there is an urgent need to develop suitable methods and strategies for the assembly and analysis of the tea plant transcriptome.At the same time,with the increase of tea genomic and metabolomic data,there is an urgent need to build a comprehensive data analysis platform that can integrate a large amount of different types of genomic information and analysis methods.To this end,this study optimises tea plant transcriptome data organisation strategies based on tea plant genomics,transcriptomics and metabolomics data resources,and constructs a tea plant genome database analysis platform with the support of a common computer system,an open database platform architecture,an efficient database storage system,and a combination of an intelligent search engine,a friendly and diverse data presentation method and an easy-touse bioinformatics analysis tool,in order to promote the sharing of big data on tea plant biology and the mining of various types of genomic research.The main findings of this study are as follows:(1)High-depth second-generation transcriptome sequencing data from eight representative tissues of tea plant were used to perform whole-tissue hybrid assembly simulations by random data extraction method,and the amount of extracted sequencing data was 32 Gb,and then five mainstream transcriptome de novo assembly software(SOAPdenovo,Trans-ABy SS,Trinity,Bridger and Bin Packer)for tea plant transcript reconstruction,respectively,and evaluated using statistical methods such as direct homologous single-copy gene library comparison of plants,public database annotation and analysis of transcript expression patterns.The results showed that the evaluation indexes of Bin Packer and Bridger assembly software were better than those of the other three software when 32 Gb was used to simulate the sequencing data.The results revealed that Bin Packer assembly software and Bridger assembly software outperformed the other three software in all evaluation metrics when using 32 Gb simulated sequencing data volume for transcriptome assembly.Further comparison showed that Bridger software was slightly better than Bin Packer software in terms of transcript N50 length,average sequence length and sequence integrity metrics,while the assembly integrity was also comparable to that of tea plant triple transcriptome sequencing,suggesting that these two software,especially Bridger,may be more suitable for de novo assembly of tea plant transcripts.(2)By randomly selecting sequences for whole-tissue mixing and single-tissue assembly with different data volumes,4-84 Gb sequencing volumes were assembled separately using Bridger software,and then evaluate the effect of data volume on tea plant transcriptome assembly.The statistical and BUSCO evaluation analysis of the basic indicators of the assembly results showed that all indicators were better when the data volume of the wholetissue mixed assembly of tea plant was 48 Gb,indicating that 48 Gb was the preferred sequencing volume for the whole(multi)-tissue assembly of tea plant.Further group transfer evaluation with different data volumes for single and multiple tissues concluded that :1)the number of assembled transcripts increased with the increasing of single-tissue sequencing data volume,while the missing rate of BUSCO also decreased.When the data volume reached 6 Gb,the losing of BUSCO was less than 20% for six of the eight tissues;when the data volume continued to increase to 9 Gb,the losing of BUSCO was less than 20% for all eight tissue assemblies,and even the integrity of the young leaf samples exceeded 90%.2)The change pattern was similar to that of single-tissue assemblies when assembling multiple tissues with different data volumes;meanwhile more than two tissues The number and completeness of transcripts obtained from mixed assemblies of more than two tissues were also better than those from single tissue assemblies.These results suggest that the data volume of single-tissue assembly is 6 to 9 Gb,which is cost-effective,but increasing the sequencing data volume of single-tissue assembly or performing mixed assembly of multiple tissues can improve the number of transcripts and the integrity of assembly.(3)In this study,through extensive collection and sequencing,the genome atlas of tea plant,a total of 97 transcriptomes,metabolomes,methylation groups,germplasm resources and a large number of biotic and abiotic stress gene expression profiles of 24 Camellia spp.were compiled and obtained,and the correlation between the data was established using the correlation of gene expression and metabolite distribution patterns,etc.;through Mysql database storage,web server Through Mysql database storage,web server tools and various computing extension packages based on JAVA language,a tea plant genome database platform with the framework of tea plant gene mapping is constructed.The platform designs the overall database interface through front-end HTML5 web pages,integrates highperformance search engine and friendly visualization tools,and provides users with basic search and result display,as well as batch download of various rich data information.By integrating various bioinformatics analysis tools(such as BLAST,GO and KEGG functional enrichment,correlation analysis,homologous gene search,ORF search,polymorphic SSR site identification and primer design,etc.),it helps researchers to quickly retrieve and deeply mine the rich histological data in the database and realize batch data acquisition and visualization.Using the transcriptomic and genomic data of different species of tea group plants collected by TPIA as application examples,this study initially constructed a phylogenetic framework of representative tea group plants,and the results showed that these tea group plant species could be divided into three groups,among which cultivated tea plants clustered together and formed sister groups with C.makuanica and C.tachangensis,and Dali tea was the phylogenetic relationship of the basal taxa.We further retrieved the metabolic accumulation data of catechin and caffeine in the leaves of tea group plants,and then mapped them to the constructed phylogenetic relationships according to the species taxonomic relationships.The results showed that the content of tea quality-related metabolites(e.g.,catechins)increased with species evolutionary trajectory,with recently diverged tea plants accumulating more catechins and caffeine than species of ancient branches.These data comprehensively reveal the dynamic evolution of tea quality-related characteristic metabolites in tea group plants,providing new insights and important clues for future functional genomics studies and breeding in tea plants.In summary,through the research and optimization of the transcriptome assembly strategy of tea plant,the present study explored the software and sequencing volume suitable for the second generation transcriptome assembly of tea plant;at the same time,the genomics and other related histological data were extensively collected and analyzed,and the most comprehensive genomic database platform of tea plant was constructed,which provides a rich data and theoretical basis for the molecular biology research of tea plant and will help promote the functional genomics,evolutionary biology and population genetics of tea plant to fully exploit the high quality genetic resources of tea plant,thus guiding the genetic breeding and variety improvement of tea plant,and thus promoting the sustainable development of tea industry.
Keywords/Search Tags:Tea plant, Transcriptome, De novo assembly, Bioinformatics, Database platform
PDF Full Text Request
Related items