Font Size: a A A

An Enhancer Identification Algorithm Based On Deep Learning

Posted on:2019-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:D DongFull Text:PDF
GTID:2370330611993339Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In this paper,we have studied algorithms of enhancers identification.In gene non-coding region,enhancer is a cis-acting element that regulates the transcriptional frequency of the target genes.Enhancer greatly affects phenotypic difference,biological evolution,disease incidence and so on.The characteristics of enhancers,such as long-distance,non-directional and cell-specific,increase the complexity of their identification.The existing enhancers identification methods are either time-consuming and laborious experimental methods or traditional machine learning algorithms which rely on complex and unsatisfactory artificial features extractions.Based on deep learning,this paper designed an artificial neural network named BiLSTM-E for large-scale prediction of enhancers on human whole genome.According to data mining,the similarity between sequences of a training set determines whether the model can learn generalized information.Multiple sequence alignment(MSA)is a technique used to measure the similarity between multiple sequences.However,there is no available MSA algorithm that can align large-scale sequences quickly and accurately.So,this paper developed an MSA algorithm called VCSRA for huge amounts of data,which provides a data set selection method for BiLSTM-E.The research of this paper includes the following three points:1.We optimized the center-star strategy commonly used in the MSA algorithms based on vector-valued mapping.The new center-star strategy VCS maps sequences into four-dimensional vectors and can select the center sequence in linear time.which greatly reduces the time-consuming of aligning without loss of accuracy.2.On the basis of VCS,this paper implemented an MSA algorithm VCSRA,and accelerated it in parallel based on MPI/OpenMP.Experiments showed that VCSRA can achieve an about 86-fold speedup and the performance of it is superior to the mainstream MSA algorithms.In addition,VCSRA is suitable for aligning sequences of any length and similarity.3.This paper focused on building the deep learning model BiLSTM-E for prediction enhancers.BiLSTM-E can directly use DNA sequences as input data.By optimizing and adjusting the model structure and hyperparameters,we made BiLSTM-E have the ability to learn enhancers,that is,the neural network is convergent in the training process.A large number of tests showed that the performance index of BiLSTM-E is better than that of the mainstream identification models.The accuracy of BiLSTM-E is not less than 90.4%,and the AUC is above 0.924.At the same time,it was proved that BiLSTM-E has high generalization.
Keywords/Search Tags:enhancers identification, deep learning, Bi-directional Long-Short Term Memory, multiple sequence alignment, center-star strategy
PDF Full Text Request
Related items