Font Size: a A A

MicroRNA Subcellular Localization Based On Deep Mining Of Sequence Patterns

Posted on:2020-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q XiaoFull Text:PDF
GTID:2370330623463647Subject:Computer technology
Abstract/Summary:PDF Full Text Request
MicroRNAs(MiRNA)are a class of short non-coding RNAs(?22 nucleotides)present in animals and plants.MiRNAs participate in a variety of cellular processes including development,proliferation,differentiation and metabolism in the organisms,and play important roles in post-transcriptional gene regulation.Especially,miRNAs have been demonstrated to be prognosis biomarkers and drug targets for complex diseases.The subcellular localization of microRNAs(miRNAs)is closely related with their biological functions.Some recent studies have discovered that microRNAs can target to various cellular compartments,and have abundant localization patterns in cells.However,to the best of our knowledge,there has been no computational tool for predicting miRNA subcellular locations to date.The major reason is that the lack of useful information source largely limits the prediction performance using traditional statistical learning approaches.We analyzed the various modules of the miRNA subcellular classification model.For the representation of input miRNA sequences,we explored a variety of word segmentation and sequence representation.For the serialization of output,we explored a variety of multi-label classification method,and proposed an entropy-based serialize method.For problem of the scarcity of miRNA features,we introduced a method for calculating the Gene Ontology similarity of miRNA,and extracted the GO Similarity as a features of miRNAs by using matrix decomposition techniques.In general,in this study,we regard this prediction task as a Sequence-to-Sequence learning process and propose an attention-based encoder-decoder model,miRLocator,to identify subcellular locations of human miRNAs.The designed miRLocator uses a bidirectional long short-term memory(BiLSTM)module to encode the input sequences,and an LSTM module to decode these context vectors as location sets.Especially,a new encoding method for RNAs,RNA2Vec,and an entropy-based method are incorporated in the model to determine the input and output representations,respectively.Besides,we added some biological features to improve our model's performance.The experi-mental results show that miRLocator achieves promising prediction accuracy with the limited input information,and outperforms the models using hand-designed features and conventional RNN models.
Keywords/Search Tags:miRNA, subcellular localization, RNA2Vec, Sequence-to-Sequence
PDF Full Text Request
Related items