Font Size: a A A

Clustering Analysis Model Of Co-methylation Pattern Based On RNA M~6A Modified High-throughput Data

Posted on:2022-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y LiuFull Text:PDF
GTID:1480306731998789Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
m~6A is the most ubiquitous methylation modification in mammalian cells,and it plays a key role in m RNA metabolism,physiology,pathology and other life processes.However,due to the complexity of the life process,what is the specific function of the m~6A methylation sites existing in different positions in the genome,which biological processes or diseases are each related to,and how to participate in these biological processes and affect the occurrence and development of diseases etc.,details are still unclear.These details help to reveal the mechanism of m~6A in different life processes from the system level,and provide important help for m~6A-related drug development and the treatment of complex diseases such as cancer.The method of biological experiment to solve the above problems often requires high economic and time costs.The biological calculation method based on m~6A modified high-throughput sequencing data for clustering analysis of co-methylation patterns can assist biological experiments to solve the above problems and save economic and time costs.However,in this regard,there are problems such as lack of reliable data sets and effective mining algorithms.To address the above problems,this dissertation analyzes and constructs a dataset of methylation modification of sites under different experimental conditions.Based on this dataset,different clustering algorithms are proposed,and m~6A co-methylation pattern mining is carried out from multiple levels.It can provide some help for the research on the details of m~6A regulation in different life processes.The main research work and innovations of this dissertation are as follows:1)A dataset of m~6A methylation modification of 69,446 sites under 32experimental conditions was constructed.In view of the small number of samples and insufficient site accuracy in the current data set,this dissertation analyzed the known Me RIP-Seq related experiments and collected the corresponding original sequencing data.Through a series of preprocessing such as data quality control,sequence alignment,etc.,and combined with site informations determined based on single-base resolution biological experiments such as mi-Clip,m~6A-CLIP,and constructed a Me RIP-Seq data set of 32 samples from 9 independent studies of different human cell lines.Compared with the current data set,this data set integrates more biological samples and has higher site accuracy.In order to mine m~6A co-methylation patterns from multiple levels,this data set is further processed into IP samples,input sample reads data,and methylation level data.This data set provides reliable and effective data support for the cluster analysis of m~6A co-methylation patterns in the subsequent chapters.2)A Moment-estimating-based Beta Mixture Model(MBMM)Clustering Algorithm was proposed.In view of the traditional beta-mixed model using partial derivative method for parameter inference,it is impossible to directly perform cluster analysis on the m~6A methylation level data of high-dimensional and small samples.This dissertation introduced the method of moment estimation for parameters estimating in the model construction process,realizing the mining of m~6A co-methylation pattern based on beta mixture model.MBMM is based on the framework of the EM algorithm.In simulation experiments,the Rand index of this algorithm on the four data sets is higher than that of the other four mainstream algorithms,indicating that the MBMM algorithm can accurately reproduce the co-methylation pattern hidden in the data.On the real data set,MBMM found seven co-methylation patterns.Pathway specificity analysis and enzyme substrate specific target site analysis showed that these patterns were enriched in varying degrees to the known pathways that are significantly regulated by m~6A and the specific target sites of m~6A-regulated enzymes.Gene ontology enrichment analysis showed that when the top 10 most enriched biological process terms in each model are retained,there are no overlap biological process terms in the seven patterns,indicating that the seven patterns have potential high regulation specificity.Comparative experimental results showed that the clustering results of MBMM on real data are more biologically meaningful than other classic clustering algorithms.3)A Beta-mixture-distribution-based Biclustering Algorithm(BDBB)was proposed.Although the MBMM algorithm realizes the mining of m~6A co-methylationpatterns from the level of the beta mixture model,it cannot capture the local co-methylation patterns of some sites under certain conditions.In response to the above problems,this dissertation defined the biclustering structure and introduced Gibbs sampling method framework to improve the MBMM algorithm,and realized the mining of m~6A local co-methylation patterns on the methylation level data from the biclustering algorithm level.In the simulation data verification experiment,BDBB's technical indicators recovery and relevance scores are both 0.994,which is significantly better than the other five mainstream algorithms,and are 0.327 and 0.28higher than the best Plaid algorithm,respectively.It can accurately reproduce the co-methylation patterns hidden in the simulation data,and shows the effectiveness of the BDBB clustering algorithm.In the real data clustering experiments,BDBB found two local co-methylation patterns,which were enriched in biological processes such as histone modification and embryonic development.Biological significance analysis shows that these two patterns are both effective local co-methylation patterns,and are more biologically significant than the clustering results of MBMM algorithm.4)A Beta-binomial-distribution-based Biclustering Algorithm(BBM)was proposed.When performing clustering based on methylation level data,it is necessary to convert IP samples reads count and input samples reads count into site methylation level data.This calculation process introduces new noise and causes data distortion,thereby reducing the clustering results reliability.In response to the above problems,this dissertation reasonable assumped that the data follows the beta binomial distribution,and realized the mining of local co-methylation pattern of m~6A directly without the need for data conversion of the methylation level.BBM employs the Gibbs sampling method for parameter estimation,and can directly perform biclustering operations on the reads count data in IP sample and input sample simultaneously.In the simulation data verification experiment,its technical indicators recovery and relevance scores are 0.995 and 0.996,which are significantly better than the other 5 mainstream algorithms,and are 0.619 and 0.432 higher than the best Plaid algorithm.BBM can accurately reproduce the local co-methylation patterns hidden in the simulation data indicate the effectiveness of its clustering.In real data clustering experiments,BBM found two effective local co-methylation patterns,which were enriched to RNA translation,covalent chromatin modification and other biological processes.The comparative experimental results showed that the clustering results of BBM are more effective biological significance than those of BDBB.5)A Enhancing Beta-binomial Distribution Biclustering Algorithm(EBBM)based on the data screening strategy was proposed.Although BBM can directly mine local co-methylation patterns on the reads data of IP samples and input samples,it cannot effectively remove the noise problem of Me RIP-Seq sequencing data whose IP sample reads are less than 1.5 times the corresponding input sample reads.In response to the above problems,this dissertation introduced a data screening strategy in the BBM model to improve the ability of the BBM algorithm to recognize noise.In the simulation data verification experiment,its technical indicators recovery and relevance scores are 0.998 and 0.997,which are significantly better than the BBM algorithm and the current five mainstream algorithms,and are 0.338 and 0.337 higher than the best BBM algorithm,respectively.It can effectively identify and remove the above noise.In the real data clustering experiment,EBBM found two co-methylation patterns,which were enriched in the negative regulation of phosphorylation,peptidyl lysine methylation and other biological processes.The results of comparative experiments showed that the clustering results of EBBM are more biologically meaningful than those of BBM.Finally,this dissertation evaluated and analyzed the co-methylation patterns discovered by the above four clustering algorithms,their biological significance GOE?score indicators,and the applicability of the algorithms.The research in this dissertation is of great help to the study of the mechanism of m~6A in different life processes and the occurrence and development of complex diseases such as cancers,and to unveil the mysteries of life,and the diagnosis and treatment of diseases.This dissertation has 43 pictures,19 tables,and 166 references.
Keywords/Search Tags:m~6A, methylation, co-methylation pattern, clustering, biclustering
PDF Full Text Request
Related items