Font Size: a A A

Construction Of The Soybean Genome Database SoybeanGDB

Posted on:2023-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:H R LiFull Text:PDF
GTID:2543306809955049Subject:Biological engineering
Abstract/Summary:PDF Full Text Request
Soybean(Glycine max)is an important crop in the world,providing a large amount of protein and edible oil for human,in addition to a wide range of uses.With the development of sequencing technology,more and more plant genomes were decoded.These genome data are usually stored in professional databases,playing important roles in the study of plant genes functions and molecular assisted breeding.In recent years,the gold reference genome of China’s cultivated soybean Zhonghuang 13 and other high-quality soybean genomes were published.However,these data were not been properly stored in specialized databases.In this study,40 high-quality soybean genomes and high-quality SNPs(Single Nucleotide Polymorphism)and InDels(Insertion and deletion)among 2898 soybean varieties were collected.Based on these datasets,we conducted transposon identification,structural variation analysis,homologous gene identification and gene function annotation in all 40 genomes.Finally,a comprehensive soybean genome database SoybeanGDB was constructed using the R/shiny software package,providing multiple functional modules for data retrieval,analysis and visualization.The main results of this study are as follows:1.Thirty-two high-quality genomes were collected,and structural variations,transposons,transcription factors and transcriptional regulatory factors were identified,Functional modules were built in the database for query of these information.2.A genome browser was constructed for each of the 40 genomes the using JBrowse2 for users to view the information of different genomes.3.A total of 31,870,983 SNPs and 6,127,057 InDels identified among 2,898 soybean samples were collected.After filtering low-quality data,15,446,616 high-quality SNPs and4,136,231 high-quality InDels were obtained.Based on these result,functional pages for querying SNP and InDel information,linkage disequilibrium analysis,nucleotide diversity analysis,single nucleotide polymorphism analysis,allele frequency analysis,etc.were constructed.4.The gene expression information of Zhonghuang 13 across multiple tissues and developmental stages was collected.Functional pages for gene expression query and gene coexpression analysis were constructed5.The chromosome sequences,protein sequences,gene sequences and CDS(Coding sequence)sequences of 40 high-quality genomes were extracted,to build the BLAST page,with blastn,blastp and other sequence alignment functions.6.Based on the high-quality SNPs and InDels and the chromosome sequence of Zhonghuang 13,the primer design interface was constructed.The homologous genes among the 40 genomes were identified,and the functions of primer design and querying homologous gene information were provided.7.GO(Gene Ontology)and KEGG(Kyoto Encyclopedia of Genes and Genomes)annotations were performed for protein-coding genes of 40 genomes.The annotation information was integrated into the database,and function modules for annotation retrieve and gene functional enrichment analysis were provided.Finally,after a period of design,development and test,SoybeanGDB was constructed and deployed on the cloud server.The users can visit https://venyao.xyz/SoybeanGDB/ for online use of SoybeanGDB.
Keywords/Search Tags:Soybean, Genome database, SNP, InDel, Zhonghuang 13, GO enrichment analysis, Genome browser
PDF Full Text Request
Related items