Font Size: a A A

Predicting The DNA Sequence Specificity Based On Deep Learning

Posted on:2022-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2480306605465584Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
DNA sequence specificity refers to the ability of a DNA sequence to bind to a specific protein.These proteins play a central role in gene regulation such as transcription and alternative splicing.Obtaining DNA sequence specificity is essential for establishing regulation models of biological systems and identifying pathogenic variants.Depending on the completeness of the sequence of interest,DNA sequence specificity can be characterized at the level of motif or sequence,but there are some problems with both currently: the motif is a sequence pattern shared by fragments of DNA sequences that bind to specific proteins.Many motif mining algorithms have been proposed.They perform well with a given motif length,but how to determine the motif length more accurately is still an urgent problem to be solved.In recent years,in order to pay attention to the entire sequence of information,DNA sequence specificity has been characterized by deep learning at the sequence level.At present,such methods have achieved high prediction accuracy,but there is still room for further improvement.The problems that exist in the above-mentioned two DNA sequence-specific characterization methods are studied separately based on deep learning.For the characterization at motif level,a motif length prediction method based on deep learning has been constructed.First,a method for constructing sample data used to predict motif length is proposed;secondly,a motif length prediction model based on convolutional neural network is constructed;then,the methods of applying the prediction model are given.The experimental results show that the prediction accuracy of the proposed method on the test set is more than 90%,the proposed method can successfully optimize the motif found by the existing motif mining algorithms,and it can effectively improve the time performance of the existing motif mining algorithms.For the characterization at sequence level,a deep learning sequence-specific prediction method that combines common factor information based on the pre-trained word vector model BERT is built.First,a DNA sequence-specific prediction method based on BERT is proposed;secondly,a calculation method for constructing a common factor vector is proposed;then,a method for fusing the common factor vector feature into the prediction model is proposed.Experimental results show that the proposed method has an AUC index of about 0.95 in the test set,which is better than the existing algorithms.It also proves the necessity of dynamic encoding of DNA characters,that is,the interpretation of the same base in different DNA sequences has certain differences.
Keywords/Search Tags:DNA sequence specificity, motif, deep learning, common factor
PDF Full Text Request
Related items