| Coral ecosystems occupy a very important position in the global marine ecology.It provides habitat and nutrients for a large number of marine life.Coral reef is formed by a large number of coral polyps,which are gradually secreted and accumulated calcium carbonate minerals for hundreds or even thousands of years.Coral reefs around the world are currently under threat of "bleaching".Its main manifestation is the expulsion of the symbiotic algae that provide coral with nutrition and energy within coral cells,resulting in the death of the coral polyps.It is believed that global warming and ocean acidification are important environmental factors leading to coral bleaching.However,the physiological mechanisms of coral bleaching are not fully understood.Transcriptome is main tool for studying the molecular mechanisms of coral/zooxanthellae symbiosis.However,compared with model organisms,coral transcriptome research has the following difficulties.First,the genomes of most corals have not yet been sequenced,and some of the sequenced genomes still have a large number of vacancies,and a large number of gene functions have not been annotated;Second,there is a lack of tools to mining current coral omics databases;Third,with the rise of single-cell technology,single-cell transcriptome research has become a powerful tool for studying coral symbiosis mechanisms.However,compared with model organisms,researchers have a relatively lack of knowledge about coral cell types and their specific gene markers.These problems become one of the main difficulties hindering coral biology research.In this paper,the following work has been done to address the above problems:(1)Constructed a bioinformatics analysis process based on third-generation full-length transcripts and second-generation transcriptome dataTo provide high-quality transcript references and annotations for coral transcriptome research.In this study,a full-length transcriptome bioinformatics analysis process was constructed by combining the advantages of high accuracy of next-generation sequencing and the advantages of third-generation read length.The pipeline takes into account the accuracy and completeness of the sequence,and integrates gene function annotation and enrichment analysis tools.(2)Construction of coral multi-omics database,mining important genes related to the identification process of coral-symbiotic algae symbiontsOn the basis of summarizing the data of the published articles and our laboratory data,we developed a multi-omics database of corals and symbiotic algae.This database is currently the most comprehensive genome and transcriptome database in the coral field.At present,a total of 31 coral species genomes,12 coral transcriptomes,4 full-length transcriptomes and annotation information collected by our laboratory,9 Symbiodinium genome,10 Symbiodinium transcriptome data was collected.We also included single-cell data of three cnidaria that can symbiotic with symbiodinium,and single-cell data of each life stage of a coral,and integrated multiple analysis tools such as enrichment analysis,homology alignment,and interaction visualization.Through the above platform,the full-length transcriptome of the reef-building coral Montipora foliosa was sequenced for the first time.We identified and summarized the genes involved in symbiont recognition in the coral-zooxanthellae symbiotic system in M.foliosa,including: complement C2,C3 and C1 q,C-type lectin,TGF-β,SRs,SGPP,SPH,nod1 TLRs and Rabs,there are 334 genes in total.The functions and isoforms of these genes were analyzed.At the same time,gene ontology and pathway enrichment analyses were performed on the highly expressed genes.A total of 156 genes related to metabolism were located on the cell membrane,and enrichment analysis found that they were involved in the biosynthesis of glycans.On this basis,the regulatory mechanism of the interaction between the coral M.foliosa and the symbiotic algae is discussed,and a hypothesis is proposed: coral cells can convert the monosaccharide produced by the algae to glycogen,thereby avoiding the fluctuation of the intracellular sugar concentration Adverse effects on coral cells.This work provides a strong support for coral symbiosis research.(3)Gene correlation visualization tool development based on gene interaction networkTo understand the mechanism of symbiosis,we need to understand the interaction of coral genes,but current biological network analysis tools cannot identify higher-order connections.To this end,according to the latest progress of graph neural networks,we propose a graph embedding-based gene interaction mining algorithm.This paper compares the applications of three commonly used graph embedding methods(Deepwalk,Node2 vec,and LINE)on gene interaction networks,and proposes a gene correlation visualization method.The scores of the three embedding methods were verified from the perspectives of clustering and interaction aggregation,and the method proposed in this study was compared with the force-directed algorithm of the gene network visualization tool Cytoscape.The results show that,1.From the perspective of visualization,Deepwalk and Node2 vec perform slightly better than LINE,and Node2vec’s score is more stable than Deepwalk.2.Graph embedding algorithm can effectively capture node interaction information in low-dimensional space.Our method significantly outperforms the Cytoscape-based gene interaction layout algorithm.At the same time,we also developed a web-based visualization tool: BENviewer.(4)Biological significance mining tool based on single-cell transcriptomeThere are a large number of genes with unknown functions in non-model organisms such as corals.Directly mining cell type-specific genes may not be conducive to the interpretation of cell functions.However,genes all have specific functional classifications.Gene Set Analysis(GSA)It is one of the main tools for biological function mining and has been widely used.Unlike most currently developed GSA tools,which are only applicable to cell population analysis,GSA studies at the individual cell level can aid in exploring the functions of rare cell types and the correlations between cells.To this end,we developed the Functional Expression Matrix(FEM)algorithm.The algorithm converts the gene expression matrix(GEM)into FEM form,which has the following three advantages: First,it assists the discovery of functional similarities and differentiation trajectories between cells,and provides ideas for studying the source of specific cells such as symbiotic calcification.Second,by analyzing the function,it avoids the interference of many unknown functional genes,and provides a reference for analyzing the functional differences and grouping of cells.Third,it helps to discover the function of a few special cells.FEM can also be integrated with GEM for downstream analysis,FEM studies on three datasets(peripheral blood mononuclear cells,human liver,and human pancreas)assist discover the cell differentiation relationships and a small number of cells in the proliferative stage. |