Font Size: a A A

Genomic Analysis Of Disease-related G-quadruplex And Database Construction

Posted on:2022-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2504306740479844Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
G-quadruplex(G4)is a special secondary structure of nucleic acid,which is widely present in the genome,including telomeres,promoter regions and UTR.G4 has a very important regulatory function in the organism.It can affect chromatin structure,gene regulation and genome stability,thus it is closely related to human diseases.G4 structure has a special spatial structure and is suitable as a drug target.There are many biological studies on G4 and diseases,but there is no G4 database related to diseases.This research aims to analyze and mine disease-related G4 data,build a disease-related G4 database,and provide web-based browsing and searching tools for G4 data.It helps to systematically study the relationship between G4 and disease,and provide important biological information for new drugs.The specific research progress is reflected in the following three aspects:(1)The whole genome analysis method and literature mining method of disease-related G4 have been established,and disease-related G4 data have been excavated.The whole genome analysis method is to analyze the genomic position of G4 and disease-related SNPs and genes.If a disease-related SNP falls in the region formed by G4,or a functional region of a diseaserelated gene contains G4 sequence,then these G4 sequences may be related to diseases.In G4whole-genome analysis,G4 data comes from G4-seq high-throughput sequencing technology,and disease-related SNP and gene data come from the Dis Ge NET database.The basic strategy for obtaining gene-related G4 data through literature mining is to search literature.We searched for G4 and disease keywords in databases such as NCBI to obtain disease-related G4 literature,and then extract disease-related G4 data from the literature.Genome analysis found that 7,074 SNPs fell on G4,involving 4134 G4 sequences,2817 genes contained G4,involving 36062 G4.Literature mining collected 438 G4 literature data,and collected 240 G4 data,involving 152 genes,146 diseases,and 1,005 experimental data.(2)Data modeling of G4 and establishment of disease-related G4 database.Firstly,the two parts of G4 data were divided reasonably,and the data structure for storing the two parts of the data and the design of the data table were determined.Then six data tables were designed to save the data,including the basic information of G4 experimental data,the basic information of the whole G4 genome,SNP information,gene information,disease information,and experimental information.The database was constructed based on My SQL database technology,and the existing data was imported into the database.(3)Web-based database browsing and search tool has been built.First of all,the requirements are analyzed and the functions of the tool were designed,including data browsing,data searching,data downloading,genome browser,data updating and management.In addition,the tool also provides auxiliary functions such as links to related databases and tool navigation to facilitate users to use the database.The search function can search based on G4 sequence,G4 location,disease,gene and other information,and the browse function can visually browse G4,SNP or gene information through the jbrowse-based genome browser.
Keywords/Search Tags:G-quadruplex, disease, database, genome browser
PDF Full Text Request
Related items