Research On Protein Function Prediction Based On Iterative Features And Graph Features

Posted on:2024-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2530307295451884

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Proteins are biological macromolecules with complex structures and multiple functions,which are loaded with various activities of living organisms in a complex and harmonious form and occupy an important position in living organisms.Among them,protein function prediction,as a hot research issue in the post-genomic era,is crucial for understanding cellular and biological activity mechanisms,studying disease mechanisms and finding new relevant drugs.As a hot research problem in protein function prediction,the accurate prediction of lysine crotonylation sites and drug-protein interactions is very important.With the rapid development of machine learning and deep learning,many computational methods have emerged,but there are still many limitations.Crotonylation site prediction usually involves extreme data imbalance,where the number of positive samples is significantly smaller than the number of negative samples;drug-protein interaction prediction suffers from insufficient learning of diverse biological associations and other problems.In order to solve the above problems,two corresponding methods are proposed in this thesis,and the main works are as follows:(1)The protein crotonylation site prediction method SEBP＿HNHC is proposed based on a two-layer stack integration iteration strategy.Firstly,in order to make full use of the dataset information,a new data partitioning strategy SED is proposed to partition the training set twice and construct a two-layer training subset.Secondly,nine encoding methods involving sequence information,materialization information and matrix evolutionary features are used for multi-view feature extraction,and to remove redundant information,the optimal feature subset is selected using Gini coefficient and forward feature selection.Based on the first layer of training data,the initial classification probabilities of the samples are obtained by combining the support vector machine algorithm.To weaken the effect of sample imbalance on the prediction performance,the initial classification probability is used as the feature vector of the second layer training data,and the stack integration model is constructed by combining seven machine learning algorithms.Finally,using deep learning and integration learning ideas,the prediction probabilities of multiple optimal feature subsets are used as the feature inputs in the deep layer,and the final prediction results are obtained by integration iterations.The experimental results verify the effective feasibility of SEBP＿HNHC.(2)The drug-protein interaction prediction method DMNDTI is proposed based on dual multi-view network learning.Firstly,the dataset is expanded and collected,and ancillary data such as diseases and side effects are integrated to construct a new large-scale dataset.One of the two multi views of DMNDTI is based on meta paths and denoising autoencoders for protein and drug related heterogeneous network learning,while the other is based on multi-channel graph convolutional networks for drug protein similarity network learning.The overall DMNDTI framework consists of four different modules,namely,representation learning of raw drugs and proteins based on meta-path,construction of multi-view drug-protein pair networks,representation learning of three-channel drug-protein pair networks and drug-protein interaction prediction.As verified by experiments,DMNDTI shows good prediction performance.

Keywords/Search Tags:

Integrated Learning, Site Prediction, Deep Learning, Multi-view Learning, Heterogeneous Network

PDF Full Text Request

Related items

1	Research On Deep Learning Model Of Moonlighting Protein And LncRNA Prediction Based On Multi-source Heterogeneous Feature Fusion
2	Deep Learning Metallogenic Prospect Prediction Method Based On Ensemble Learning Idea
3	Research On Network Representation Learning Methods Based On Deep Learning
4	Prediction Of Enhancers And N4 Methylation Sites Based On Ensemble Learning And Deep Learning
5	Research On Intelligent Prediction Methods For Multi-Level Enzyme Function
6	Research And Application Of Multi-task-oriented Network Representation Learning Method
7	The Research On The Prediction Method Of Protein Succinvlation Sites Based On PU Learning And Deep Learning Technology
8	The Study For The Prediciton Of Protein Ubiquitination Sites Based On Deep Learning
9	Research Of Models And Methods For Multi-Granular Network Representation Learning
10	Precipitation Forecast Spatiotemporal Sequence Prediction Research Based On The Fusion Of Deep Learning And Ensemble Learning