| Methylation of cytosine-phosphate-guanine(CpG)sites are closely related to gene expression regulation,cell pathway activation,ontogeny,and disease progression.With the improvement in methylation detection methods,the analysis strategy for high-throughput methylation also improved.Methods using multiple adjacent CpGs as a panel are increased.Adjacent methylated CpG sites usually can form a large methylation correlation block(MCB).This strategy has been proven to be an effective way,but there is still a lack of systematic analysis for the modelling methodology of methylated regions,as well as processes and software for analyzing these MCBs.Therefore,this thesis intends to take MCB as the research target,conduct systematic research on methylation and corresponding gene expression regulation,and develop models,analysis procedures,software packages and databases.Based on the high-throughput methylation profiles,correlation methods were used to analyze the correlations of adjacent CpGs.We analyzed the properties of MCBs,such as the number of CpG,length,chromosome location,and methylation changes in MCB.The length of MCB and the number of CpG differ among chromosome locus.Analysis of the distribution of methylation showed that the position of peaks and troughs has the most dramatic methylation variations.We also found that MCBs can regulate gene expression at different loci,including MCBs at the upstream,internal and downstream.Moreover,four major patterns are observed: positive regulation,negative regulation,adjacent conflicting,and long range conflicting.The results show that the gene regulation modes are mainly negative.They account for 68.35% of MCBs.The adjacent conflicting regulation and long range conflicting regulation account for only 7.76% and 4.65% of MCBs.In negative regulation,the main negative regulation mode is the negative regulation in a single region.The relationship between MCB and topologically associating domains(TAD)and loops was obtained.The spatially adjacent DNA regions may tend to be co-methylated.The results showed that a large amount of MCBs is distributed in the TADs.And in the loops,the correlation between MCBs is significantly higher than in the non-loop regions.For accessing the differentially methylated blocks,an attractor framework was developed.This framework combines the information of the variances of CpG sites in MCBs and the globe CpGs.Using the attractor framework,we compared different submodels.The results showed that using the Kolmogorov-Smirnov test as a secondly method can achieve the best classification performance,with the area under the receiver operating characteristic curve(AUC)of ~ 0.90.For the prognostic analysis,we propose a weighted stack learning method,which combines four methods: Cox,elastic net,support vector regression,and mboost.Using pan-cancer datasets,the performances of MCB models were obtained.The results showed that the AUC in the test set was up to ~ 0.70.Based on The Library of Integrated Network-Based Cellular Signatures(LINCS)database,a small number of compounds were ranked using the reverse gene expression score.Top ranked compounds have the potential to inhibit the expression of oncogenes or promote the expression of tumor suppressor genes.In conclusion,this thesis takes MCB as the research object.Firstly,using pan-cancer databases,the genomic characteristics of MCBs are obtained,and a series of MCB and gene regulation patterns are discovered.Secondly,we developed an attractor framework and the weighted stacking learning ensemble model to assess the predictive ability of the MCBs.Moreover,compounds that have gene expression reversal potential were identified.Finally,the R language software package En MCB for analyzing the differentially methylated blocks and the survival predictions and a series of databases for analyzing MCBs were developed based on our results. |