Font Size: a A A

The Research Of Prediction N6-methyladenosine(m6A) Sites Based On Convolution Neural Networks And Multiple Sequence Encoding Scheme

Posted on:2019-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:P W XingFull Text:PDF
GTID:2370330626452392Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
N6-methyladenosine(m6A)refers to methylation of the adenosine nucleotide acid at the nitrogen-6 position.It plays an important role in a series of biological processes,such as splicing events,mRNA exporting,nascent mRNA synthesis,nuclear translocation and translation process.Numerous experiments have been done to successfully characterize m6 A sites within sequences since high-resolution mapping of m6 A sites was established.However,as the explosive growth of genomic sequences,using experimental methods to identify m6 A sites are time-consuming and expensive.Many conventional computational methods for identifying N6-methyladenosine sites are limited by data scale.Taking advantage of the thousands of m6 A sites detected by high-throughput sequencing,it is now possible to discover the characteristics of m6 A sequences using deep learning techniques.sThe main research contents of this paper include:(1)Introducing two preliminary studies on the recognition of m6 a sites.Proposing a machine learning prediction model based on position-specificity and support vector machine with multi-interval nucleotides.Extracting high-level abstract sequence features based on deep belief network and conducting a computational model that identifies m6 a sites in combination with traditional features and this deep learning feature.(2)Introducing four RNA sequence coding patterns.In this paper,four methods of sequence representation are proposed,including one-hot coding and features based on neighboring site state coding,embedding word embedding coding features and gene2 vec coding.Training segment of gene sequences into RNA pseudo words and representing them by the characteristics of the learning vector space using NLP word embedding model(3)The gene sequence site data of four different coding schemes were classified by four one-dimensional CNN networks with different hyperparameters and network structures.(4)Using the first layer of CNN convolution kernel scanning gene high frequency motif to compare with existing motifs,revealing the interpretability and visualization of deep learning in gene sequence representation.(5)Developing an online N6-methyladenosine site prediction platform that supports multiple coding and deep learning network prediction modes,and providing a recompiled data set for subsequent researchers to use.
Keywords/Search Tags:N6-methyladenosine, Site Prediction, Deep Learning, Word Embedding
PDF Full Text Request
Related items