Font Size: a A A

LncRNA Tissue Specificity Prediction Based On Deep Learning

Posted on:2022-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:K DuanFull Text:PDF
GTID:2480306536454714Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNA is a kind of long and diverse non-coding RNA,which lacks protein coding ability.Its length ranges from two hundred nucleotides to hundreds of thousands of nucleotides.Long non-coding RNA plays an important role in cell development and metabolism.It is important to determine the expression specificity of lncrna in different tissues or subcellular for understanding its biochemical function.The amount of long non-coding RNA is extremely large,and the expression specificity verification through traditional biological experiments is expensive and time-consuming.Therefore,it is particularly important to predict the tissue specificity of long non-coding RNA by an effective and rapid calculation method based on long non-coding RNA sequenceIn the long non-coding RNA related prediction methods,the main problems are feature extraction and task model construction.In this thesis,firstly,propose a novel feature extraction method based on long non coding RNA sequence,and then propose a tissue specificity prediction method of long non-coding RNA combined with deep learning model.In the feature extraction of long non-coding RNA,this thesis uses discrete Fourier transform for feature extraction.First,the long non coding RNA is encoded as a digital sequence;Then,the digital sequence of long non coding RNA is mapped from time domain to frequency domain by using discrete Fourier transform.Finally,the feature vector is obtained by intercepting the fixed length vector in frequency domain;In the aspect of task model construction,the task type is extended to multi-output model based on the dimension of label data.In the model construction,vgg16?bn and resnet18 convolution neural networks in one-dimensional convolution form are used as basic skeletons when the model is constructed.Finally,based on the two types of convolutional neural networks,three task models are constructed:multi-output regression model,multi-output multi-class classification model and multi-label classification model.In the multi-output regression analysis,the two models have obtained good average L1?Loss,among which the average L1?Loss of vgg16?bn model is0.1449,and the average L1?Loss of resnet18 model is 0.1636;In the multi-output multi-class classification,the two models have achieved good accuracy on different tissues.For example,the accuracy on the live?R tissue has reached 0.8580;In the multi-label classification,the two models have achieved good overall accuracy and precision both.The overall accuracy and precision of the vgg16?bn model are 0.7101 and 0.5836,respectively,and the overall accuracy and precision of the resnet18 model are 0.7069 and 0.5789,respectively.In summary,the experimental results of the method proposed in this thesis are relatively accurate.
Keywords/Search Tags:long non-coding RNA, tissue specificity, deep learning, multi-output, discrete Fourier transform
PDF Full Text Request
Related items