Font Size: a A A

Research On The Prediction Of Human Disease Related Problems Based On Convolutional Neural Networks

Posted on:2022-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1484306734498234Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The disease is an abnormal life activity process that occurs by the disturbance of the body's self-stability under the action of a certain cause.To describe diseases from the perspective of molecular biology,we think that it is caused by the changes in the quality and quantity of human proteins.These changes can lead to disorders of cell function,and eventually cause abnormal work in certain tissues or certain tissues of the human body.Although the methods based on molecular biology experiments can obtain more accurate experimental results,it is time-consuming,low efficiency,and cannot be verified on all datasets.However,the methods based on computer-aided algorithms have low cost,high efficiency,and can perform predictive analysis on unknown information,which greatly compensates for the deficiencies in molecular biology experiments and can provide some guidance for molecular biology experimental methods.Therefore,we mainly focus our research on computer-aided algorithms for human-related diseases.Therefore,we mainly focus on the correlation prediction of human-related diseases based on computer-aided algorithms.To our best knowledge,there are many prediction topics related to human diseases.In this thesis,we select three highly concerned prediction topics for in-depth study,including the prediction of ribosome stalling sites,the prediction of protein-protein interaction between humans and viruses,and the prediction of the association between long non-codingRNA and diseases.Considering that the convolutional neural networks(CNNs)have the characteristics of automatic feature extraction and translation invariant classification of input features,we mainly utilize CNNs to conduct in-depth research on the above three prediction topics.The main works of this thesis can be summarized as follows:(1)In the prediction of ribosome stalling sites,we propose a new method for predicting ribosome stalling sites based on a multi-feature convolutional neural network,named Deep Rib St.Firstly,because the feature selection of existing methods is too simple,and the effect of effective biological prior knowledge on the performance of feature extraction in deep learning is ignored,we starts from the biological point of view,extract three types of new features,including sequence conservation,hydrophobicity,and amino dissociation constant,and then use a new multi-feature fusion method to fuse new features with common ones.Secondly,aiming at the problem that the existing network model based on deep learning is too simple to deal with the noise caused by multi-feature fusion,and it is difficult to explore the inherent correlation between different feature descriptions,we herein increase the number of CNN channels and the depth of the network and construct a new CNN-based model for comprehensive feature extraction.Finally,we evaluate Deep Rib St on five datasets including human and yeast and compare the results to those of common deep learning models and the state-of-the-art methods.The results verified that our proposed Deep Rib St achieves the best results on multiple evaluation indicators and is effective for the prediction of ribosome stalling sites.(2)For the HVPPI prediction,we propose a new method based on protein unique representation and convolutional neural networks.Firstly,aiming at the problems of insufficient experimental data sets used by existing methods and not considering the bias of sequence homology to experimental results,etc,we comprehensively considered all HVPPI experimental data sets in seven authoritative databases,and used the blastp algorithm to de-homogenize the data to construct a non-redundant and non-homologous experimental data set.Secondly,since the existing methods do not consider the inherent hidden layer features of the protein sequence itself,we use the Uni Rep method to extract the unique protein representation matrix features,which include the intrinsic secondary and tertiary structural features of each protein sequence;Thirdly,we design a new model based on convolutional neural networks,which can deeply integrate several types of extracted features for HVPPI prediction.Finally,to verify the superiority of the proposed method,we compare it with different methods including common deep learning models and the state-of-the-art HVPPI prediction approaches.The experimental results show that the performance of the proposed method is better than that of the existing methods,and accurate HVPPI prediction performance is obtained.(3)For the prediction of the association between lncRNAs and diseases,we propose two novel prediction algorithms,called LDNFSGB and MCA-Net,based on the traditional machine learning methods and the deep learning methods,respectively.In LDNFSGB,six commonly used similarity feature extraction methods are firstly considered,and then an effective fusion method is used to fuse the above extracted features to form a new feature matrix.Secondly,we propose to use the auto-encoder to reduce the dimension of the features to obtain a representative low-dimensional feature matrix.Furthermore,the Gradient boosting algorithm is used to predict the association of the lncRNA-disease.Finally,we use three validation methods to evaluate the LDNFSGB algorithm on three datasets.The results show that LDNFSGB outperforms the existing machine learning-based prediction methods.Besides,case studies of several common diseases further verify the effectiveness of the LDNFSGB algorithm.In MCA-Net,we firstly propose a new multi-feature fusion method to deeply merge the six feature similarity matrices to construct a new feature matrix for a comprehensive feature extraction of lncRNAs and diseases.Secondly,considering that the attention mechanism can assign different weights to features according to their importance,a convolutional layer module based on the attention mechanism is constructed for feature learning.Finally,a new lncRNA-disease association prediction method based on multi-feature fusion and attention convolutional neural network is proposed.By comparing with the common deep learning models(including LDNFSGB)and the latest deep learning-based prediction methods,it is verified that our MCA-Net achieves the best prediction performance on the three datasets.To further verify the effectiveness of MCA-Net,case studies on some common diseases are conducted.The experimental results indicate that MCA-Net is an accurate and stable lncRNA-disease association prediction algorithm.
Keywords/Search Tags:CNNs, Ribosome stalling sites, Protein-protein interaction between human-virus, LncRNA-disease association, Prediction
PDF Full Text Request
Related items