Font Size: a A A

Predicting The DNA Sequence Specificity Based On K-mer Vector

Posted on:2019-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:L Q HuangFull Text:PDF
GTID:2428330545451218Subject:Software engineering
Abstract/Summary:PDF Full Text Request
DNA is a genetic carrier of life.DNA sequence specificity refers to the ability of a DNA sequence to bind a specific protein.DNA sequence specificity plays a crucial role in gene regulation.Identifying DNA sequence specificity by biochemical approaches costs a lot of time and money.This thesis studies how to predict the DNA sequence specificity by deep learning technologies.First,a DNA word vector model has been constructed and trained.We segment the DNA sequence into 3-mer words.Word2 vec model is applied to learn the word vectors of DNA sequences.The results of trained word vector match well the relevant biochemical semantics.Second,a deep learning model for predicting the DNA sequence specificity based on DNA word vector has been built.Convolutional neural network is used to capture the local feature of DNA word sequence,and then Bi-directional recurrent neural network is used to capture the global features of the DNA word sequence.These auto-extracted features are then fed into multi-layer perceptron together with other encoded features such as proteins to be trained a classifier.Compared with the state-of-art method,our model has achieved very competitive results.The average value of the AUC on the test set is increased by 5%.In addition,our model can identify different binding proteins with more robustness.
Keywords/Search Tags:Word Vector, Deep Learning, Convolutional Neural Network, Bi-directional Recurrent Neural Network, DNA Sequence Specificity
PDF Full Text Request
Related items