Font Size: a A A

Research On DNA Sites Prediction In The Framework Of Deep Learning

Posted on:2019-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2370330563991958Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the emergence of high throughput detection technology,the molecular biological databases are expanded geometrically.This prompted biologists to use machine learning to solve a series of research problems in the field of molecular bioinformatics.In this paper,we mainly study the prediction of splicing sites and the identification of promoters and their strength based on deep sparse auto-encoder.The main research work are as follows:1)In this paper,several commonly used feature extraction methods were introduced,and the research progress of DNA site prediction on machine learning methods had been summarized.Several conventional machine learning methods,such as support vector machines,random forests and libD3 C,and the popular deep sparse auto-encoder were concluded.And evaluation indexes of classification algorithms are systematically analyzed.2)Analysis and prediction of DNA splicing sites.Gene splicing is one of the most significant biological processes in eukaryotic gene expression.Thus,identifying splicing sites in DNA/RNA sequences is significant for both the bio-medical research and the discovery of new drugs.To identify the splice donor sites and splice acceptor sites accurately and quickly,a deep sparse auto-encoder model with two hidden layers,called iSS-PC,was constructed based on minimum error law,in which we incorporated twelve physical-chemical properties of the dinucleotides within DNA into PseDNC to formulate given sequence samples via a battery of cross-covariance and auto-covariance transformations.In this paper,five-fold cross-validation test results based on the same benchmark data-sets indicated that the new predictor remarkably outperformed the existing prediction methods in this field.For the convenience of the vast biologic researchers,an easy-to-use web-server for identifying slicing sites has been established for free access at: http://www.jci-bioinfo.cn/iSS-PC.3)The research of promoters and their strength prediction.Promoter plays an important role in gene transcriptional regulation.Promoter should be divided into two categories: strong and weak promoters in strength,on the basis of their distinct levels of transcriptional activation and expression.Generally,the strong promoters control either transcriptional regulators or functions related to porin homeostasis,and then,can increase the transcription frequency and increase foreign genes expression level.Therefore,it is very necessary to predict which of strength types the identified promoters belongs to.In the current study,firstly,we adopted hybrid features including moving average method,Pseudo trinucleotide composition and nucleotide density for featureextraction,by using seven physicochemical properties of trinucleotides.And then,several models were constructed based on different classification algorithms,such as support vector machine,random forest and deep sparse auto-encoder.Secondly,we used the physicochemical properties and nucleotide density of nucleotides to extract feature and constructed the prediction model based on SVM.By comparing the corresponding five-fold cross-validation results,we concluded the latter one was better than the former.4)Finally,the research work of this paper is summarized,and the future research work is prospected,including the use of other deep learning models,The further analysis and discussion of other DNA modification sites,the extraction of the DNA structural information,the application of these research methods mentioned in this paper in the field of RNA site prediction,and so on.
Keywords/Search Tags:splicing site, strong promoter, weak promoter, deep sparse auto-encoder, libD3C
PDF Full Text Request
Related items