Font Size: a A A

CpG Island Prediction Analysis And Visualization Platform Construction Of Human Genome Based On Parallel Acceleration Technology

Posted on:2022-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z C DaiFull Text:PDF
GTID:2480306551970119Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,bioinformatics data has entered the era of big data of multivariate omics on the scale of EB.How to quickly transform the big data of biological information into new knowledge and apply it to further research depends on the application of data mining technology.For human genome research,studies have shown that exist in the human genome CpG island,so the DNA sequence of a special,it is closely related to the human gene expression regulation,for studying the CpG island sequence on the loci of the specificity of the sequence distribution,CpG island,CpG methylation coding and the relationship between the gene and its function correlation is very important.Therefore,the purpose of this paper is to use big data mining and analysis methods to study the sequence characteristics and methylation of CpG islandsBased on the investigation of the current situation at home and abroad,the following problems still exist in the relevant research on CPG Island:(1)In previous studies,CpG islands were divided into high-density CpG island group,medium-density CpG island group and low-density CpG island group according to the change of CpG island density,but the correlation between CpG island density and gene expression regulation is still unclear.(2)When the existing CpG island recognition algorithm is applied to genomic data with large amount of data,its computation speed is obviously slow and the memory consumption is too large.(3)The functions of the existing Web service platform mainly focus on simple statistics and downloading of CpG Island data,but there is no visualization method to study and display the relationship between CpG Island's genetic characteristics.In view of the above problems,the main research contents and research results are as follows:(1)This paper introduced the CpG island density as a research parameter into the CpG island big data research model,and conducted multi-dimensional correlation analysis on the CpG island gene sequence big data from the aspects of CpG island density,sequence characteristics,methylation state,gene expression specificity,CpG site distribution,etc.A new annotation and analysis method of human genome big data based on GTEX project,CPGCluster algorithm and GO enrichment analysis was proposed.Based on this,this paper discussed the reasons why CpG island-related genes were mainly regulated by high density CpG island group and why housekeeping genes were mainly regulated by CpG island-related mechanisms.2.It was found that HCGI/TATA± group and LCGI/TATA± group showed different GO enrichment function,while ICGI/TATA± group showed poor GO enrichment analysis result.3.It proves the importance of CpG density and CpG spacing in the study of CpG islands.(2)This paper designs MR-CPGCluster distributed algorithm based on Map Reduce and Hadoop Streaming framework,and proves that it has higher parallel performance and computing efficiency than the original CPGCluster algorithm.For a larger number of biological information data,it has higher acceleration ratio,scalability and scale growth(3)This paper develops a visual big data analysis web platform which provides CpG island related research functions,supports online analysis,visual composition and download functions of CpG island research data,and shows the research content and results of this paper for CpG island in the platform.
Keywords/Search Tags:CpG island, data mining, distributed computing, data visualization
PDF Full Text Request
Related items