An Enhancer Identification Algorithm Based On Deep Learning

Posted on:2019-08-03

Degree:Master

Type:Thesis

Country:China

Candidate:D Dong

Full Text:PDF

GTID:2370330611993339

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In this paper,we have studied algorithms of enhancers identification.In gene non-coding region,enhancer is a cis-acting element that regulates the transcriptional frequency of the target genes.Enhancer greatly affects phenotypic difference,biological evolution,disease incidence and so on.The characteristics of enhancers,such as long-distance,non-directional and cell-specific,increase the complexity of their identification.The existing enhancers identification methods are either time-consuming and laborious experimental methods or traditional machine learning algorithms which rely on complex and unsatisfactory artificial features extractions.Based on deep learning,this paper designed an artificial neural network named BiLSTM-E for large-scale prediction of enhancers on human whole genome.According to data mining,the similarity between sequences of a training set determines whether the model can learn generalized information.Multiple sequence alignment(MSA)is a technique used to measure the similarity between multiple sequences.However,there is no available MSA algorithm that can align large-scale sequences quickly and accurately.So,this paper developed an MSA algorithm called VCSRA for huge amounts of data,which provides a data set selection method for BiLSTM-E.The research of this paper includes the following three points:1.We optimized the center-star strategy commonly used in the MSA algorithms based on vector-valued mapping.The new center-star strategy VCS maps sequences into four-dimensional vectors and can select the center sequence in linear time.which greatly reduces the time-consuming of aligning without loss of accuracy.2.On the basis of VCS,this paper implemented an MSA algorithm VCSRA,and accelerated it in parallel based on MPI/OpenMP.Experiments showed that VCSRA can achieve an about 86-fold speedup and the performance of it is superior to the mainstream MSA algorithms.In addition,VCSRA is suitable for aligning sequences of any length and similarity.3.This paper focused on building the deep learning model BiLSTM-E for prediction enhancers.BiLSTM-E can directly use DNA sequences as input data.By optimizing and adjusting the model structure and hyperparameters,we made BiLSTM-E have the ability to learn enhancers,that is,the neural network is convergent in the training process.A large number of tests showed that the performance index of BiLSTM-E is better than that of the mainstream identification models.The accuracy of BiLSTM-E is not less than 90.4%,and the AUC is above 0.924.At the same time,it was proved that BiLSTM-E has high generalization.

Keywords/Search Tags:

enhancers identification, deep learning, Bi-directional Long-Short Term Memory, multiple sequence alignment, center-star strategy

PDF Full Text Request

Related items

1	The Design And Implementation Of A Multiple Sequence Alignment Algorithm Based On Suffix Tree Strategy
2	Analysis And Application Of Deep Learning Long-term And Short-term Memory Algorithms And Monte Carlo Method
3	Precise Identification Of RNA Editing Sites From DNA Sequence Data Based On Deep Learning Methods
4	Precipitation Forecast Spatiotemporal Sequence Prediction Research Based On The Fusion Of Deep Learning And Ensemble Learning
5	Research On Flash Flood Forecasting Based On Long Short-Term Memory Networks
6	Reconstruction Of Central Arterial Pressure Signal Based On Long Short-term Memory Network
7	Research On Short Term Forecast Of Fog Based On Deep-Learning
8	Application Of Long Short-term Memory Network In Short-term Rainfall
9	Research On Meteorological Prediction Based On Long Short-term Memory Network
10	An Interpretable Deep Learning Model For Surface Hydrological Processes At Multiple Time Scales