| Objective: Indian medicine,one of the top three traditional medicines,plays an important role in basic health care in India.With the world’s pursuit of plant-based medical systems,Indian medicine,especially Indian herb,is beginning to emerge on the world stage.The surge in demand for herbs has been followed by a scramble for development and quality.Genomic data such as nuclear genome,transcripome,chloroplast genome(cp-G)and DNA barcode are powerful supports for the healthy and sustainable development of herbal medicine,which can effectively promote the resolution of problems such as breeding,standardized planting,heterologous synthesis of active ingredients,correct identification,and genetic relationship analysis.At present,however,the relatively small amount of omics data on herbs is scattered in the public database that stores the molecular data of various organisms,thus hindering the use of omics data by researchers to carry out various studies on herbs.Indian Pharmacopoeia(IP)is the national code of Indian pharmaceutical standards promulgated by the Indian Pharmacopoeia Commission.After our preliminary statistics,IP herbs and herbal products special included 66 herbal species with clear Latin names.In order to better serve Indian herbs with the genomic data,we established the Indian pharmacopoeia herbal genome database(IPGD)based on the list of herbal species collected by IP.Method: IPGD is mainly built by LNMP(Linux + Nginx + My SQL + PHP)server architecture mode;the cp-G generated in this study is acquired,assembled and analyzed by the second generation sequencing platform and various biological information analysis methods and software(such as Fast QC,Skewer,BLAST,ABy SS,Plann,MAUVE,etc.).Main findings1)This study completed the construction of IPGD(v1.0).This database can be accessed free of charge through the Internet(http://ipgenome.org/),users can view and browse the data,data information and species introduction of IP herbal medicine in this database,conduct molecular identification,download and upload relevant data.At present,the database includes the basic introduction of 66 herbal species,1529 ITS2 of 62 species,583 psb A-trn H of56 species,69 cp-G of 52 species,the optimal nuclear genome data of 16 species,and the information of 857 genome second-generation sequencing original data of 12 species whose nuclear genome assembly data are not published and 2075 transcriptome second-generation sequencing original data of 31 species.Different data formats are stored under different data types.Basic information and pictures of corresponding herbs and plants are provided under species introduction.The database allows and encourages users to download and register account to submit relevant data.It also provides species identification function based on BLAST software,DNA barcode and cp-G sequence.In addition,for the convenience of information search,the database sets up the text search function in the station,and the collation of synonyms under the item of species introduction,which avoids the information omission caused by different names.2)cp-G of 25 herbs in the database was first published in this study.The results of assembling and analyzing these data are as follows:(1)The cp-G of most species is very conservative in size and structure,most of them are between 150-160 kb,and the structure is typical four-stage structure;(2)The cp-G of Trigonella foenum-graecum(fenugreek)and Astragalus gummifer are smaller(about 120KB)due to the loss of IR region,and the expansion of IR region leads to the cp-G of Acacia nilotica subsp.Indica and Berberis aristata are larger(about 170 KB);(3)The genes annotated in cp-G are mostly genes related to photosynthesis,transcription and translation;(4)The number of genes is about 135,with Acacia nilotica subsp.Indica and Berberis aristata are larger(149,141),fenugreek and Astragalus gummifer are smaller(111,111);(5)A total of 10 genes(rps16,acc D,rpl32,rpl33,inf A,ycf15)are deletions or pseudogeneticizations in different species;(6)18 genes containing intron annotated increase,decrease or even disappear in some species;(7)Some gene pairs overlap in all or part of the species;(8)The cp-G in most species is consistent with tobacco in gene sequence;LSC and SSC regions of fenugreek and Astragalus gummifer are inverted.3)The RNA editing site prediction of cp-G protein coding genes and repeat sequence analysis of the above 25 herbs were completed.Genera analysis reveals the specific phenomenon occurred in some groups,such as,the pseudogene of acc D gene,the expansion of rpl23 gene in Boerhavieae,the loss of rpl2 gene introns in Nyctaginaceae,the pseudogene of rpl33 gene in Rubieae,the loss of ycf15 gene,the expansion of acc D,ycf1 genes in Asclepiadoideae,and found the linear relationship between the expansion of acc D,ycf1 genes with repetitive sequence.Conclusions: This study published 25 cp-G firstly,and established the first special IP herbal genome database.The database is a database integrating data and data information storage,herbal species description,molecular identification and other functions.It is convenient for researchers to get the published information of group data in time and make full use of it.Furthermore,it provides molecular support for the healthy and sustainable development of herbal medicine in the world. |