Font Size: a A A

Research On Protein Function And RNA Interaction Prediction Based On Deep Learning

Posted on:2022-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:R B GaoFull Text:PDF
GTID:1480306536499044Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the past ten years,with the number of proteins increasing rapidly,the higher requirements are put forward for protein function prediction because of development of new drugs,so computational models and protein feature representations for enzyme function prediction are particularly important.Because certain proteins are related to their multiple functions when they perform different reactions,traditional methods cannot better understand the functions of newly discovered enzymes in chemical reactions,and only a very small number of proteins have experimental evidence to support gene annotation.RNA plays an important role in many biological processes,and the function of RNA is mainly achieved by combining with a variety of proteins.However,as the complexity of RPI networks increases,high-throughput biotechnology is usually expensive and time-consuming,so there is an urgent need for high-speed and reliable calculation methods to predict RNA-protein interactions.Therefore,this paper starts research from the following aspects:First of all,aiming at the single problem of obtaining biological feature information by existing methods,we propose a method to construct a specific scoring matrix to express the position information of amino acids,and construct the matrix of protein network structure information by using the random walk.In order to obtain a better prediction effect,we use hot coding to extract other sequence information,which will lay a foundation for protein function prediction.Secondly,aiming at only using protein geometry or protein sequence in existing enzyme function prediction methods,we propose a model of multiple deep convolutional neural networks(DCNNs)to predict function of single-label enzyme(named ADL-DCNN),comprehensively considering structure and sequence information.The position scoring matrix is used to extract mutation information from the amino acid sequence for expressing the sequence information,and the distance and angle of the amino acids are used to express the structural information.In order to facilitate the machine to learn,a histogram is used to handle the extracted sequence and structural features,respectively,three deep convolutional neural networks were used to learn the three features extracted,respectively,and two different architectures were used to fuse the features output from the network;finally,the prediction results were obtained through a KNN classifier.Thirdly,aiming at the low Accuracy of the existing methods for predicting the function of multi-label proteins through structural networks,we propose a multi-function prediction method of protein based on SVM and sequences kernel similarity matrix(named SVM-SKSM).We employ position specific scoring matrix(PSSM)and Gaussian Kernel similarity to get the similarity information between sequences,and introduce deep autoencoders to obtain structures interaction information.The class probabilities of sequences are obtained by the maximum similarity probability,and the class probabilities features of structures are gained by SVM classifier.We fuse the two class probabilities by the linear combination,and input the fused probability vector into a SVM to classify secondl,which realizes the multi-label function prediction of protein.Fourthly,aiming at the existing computational RNA-protein interaction models that are defective in multiple biological information processing,we propose a hybrid deep learning model: RPI-MCNNBLSTM,which combines three convolutional neural networks(CNN)with a BLSTM network,to predict RNA-protein interactions using many-sided biological information including protein sequences,RNA sequence and structure.We adopt a filling method to process sequence and structure into equal length,and perform numerical encoding for the sequence and structure of the equal length,respectively,which are appropriate for subsequent convolution operations.The three CNNs are established to learn the three biological information,separately,then the BLSTM is used to capture the long range dependencies among the three features identified by the CNNs.The learned weighted representations are fed into a classification layer to predict nc RNA-protein interactions.Finally,the above methods proposed were performed on 43843 enzymes,yeast and human gene datasets,and 6 RPI(RNA-protein interaction)datasets,and we compare the experimental results with existing methods.The experiments verify the effectiveness of the methods proposed in this paper.
Keywords/Search Tags:Deep learing, BLSTM, SVM, Protein function prediction, Amino acid sequence, Mutation information, Gaussian kernel similarity, Maximum similarity probability, RNA-protein interactions
PDF Full Text Request
Related items