Font Size: a A A

Genome-wide Prediction Of DNA Methylation Using Abelian Complexity

Posted on:2019-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:2370330545496380Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Although DNA sequences which carry genetic information are almost invariable in human tissue cells,the epigenetic features on it show a great diversity,which are considered to be the main reason leading to cellular specific gene regulation.DNA methylation may be the most thoroughly studied epigenetic feature in current researches.The changes of DNA methylation levels are closely related to the selective expression and regulation of genes,and play a key role in gene imprinting and X chromosome inactivation.Studies have shown that abnormal hypermethylation states of genes' regulatory elements including promoters can lead to the occurrence of various diseases such as cancers,so accurately identifying methylation levels in a given region would not only help to reveal the mechanisms under transcriptional regulation,but also can contribute to human's knowledge of the formation of various diseases.At early stages,investigators relied upon various experimental methods to determine DNA methylation sites.However,experimental methods are time consuming and costly,and even worse they could not cover all CpG sites of the genome.An alternative strategy is to use computational methods instead to deduce DNA methylation levels at those undetermined CpG sites.In recent years,along with the wide usage of machine learning,researchers begin to apply machine learning algorithms to build prediction models for DNA methylation.However,the success of machine learning-based prediction methods relies heavily on the quality of the input features.This study proposed a novel feature extraction algorithm for DNA sequences called “Abelian complexity” and built a genome-wide DNA methylation prediction model in human.We innovatively applied “Abelian complexity”,a novel mathematical concept in the “composition of words” research domain,to feature extraction of DNA sequences.First,considering the influences of the window sizes with the CpG site as its center on the prediction accuracy,we tested all window sizes in the range of 100 bp to 2000 bp for each chromosome(step size 100 bp,here bp indicates base pairs),and found that the 1300 bp window size led to the best prediction ability.Furthermore,we used the chi-square statistics and mutual information to screen the features of the original Abelian complexity features of 1301 dimensions,and found that the 14th-50 th dimensions of the Abelian complexity features differed greatly between methylations and unmethylations.In addition,since the DNA composition features were always considered as the basic features in DNA methylation prediction,we significantly improved the model's predictive ability by combining Abelian complexity and DNA composition features.Finally,in order to select the most suitable machine learning method,this study compared four common machine learning algorithms including support vector machine(SVM),Random Forest algorithm,k-nearest neighbors and Naive Bayes algorithm.In the 5 tested cell lines,we found that SVM obtained a higher and stabler prediction result.In summary,this study applied the computation of Abelian complexity to DNA sequences for the first time.Our predictive model for DNA methylation is based on a consolidated processing pipeline including window size optimization,univariate feature selection,and support vector machine(SVM)learning algorithm.The prediction of the whole genome CpG methylation levels is able to reduce the range of potential targets and difficulties,providing powerful reference and guidance for relevant experiments,and it can help to analyze the transcriptional regulation mechanisms of human complex diseases and improve the genomic annotation of functional elements.
Keywords/Search Tags:DNA methylation prediction, Abelian complexity, DNA composition features, machine learning, feature screening
PDF Full Text Request
Related items