| Proteins are biological macromolecules with complex structures and multiple functions,which are loaded with various activities of living organisms in a complex and harmonious form and occupy an important position in living organisms.Among them,protein function prediction,as a hot research issue in the post-genomic era,is crucial for understanding cellular and biological activity mechanisms,studying disease mechanisms and finding new relevant drugs.As a hot research problem in protein function prediction,the accurate prediction of lysine crotonylation sites and drug-protein interactions is very important.With the rapid development of machine learning and deep learning,many computational methods have emerged,but there are still many limitations.Crotonylation site prediction usually involves extreme data imbalance,where the number of positive samples is significantly smaller than the number of negative samples;drug-protein interaction prediction suffers from insufficient learning of diverse biological associations and other problems.In order to solve the above problems,two corresponding methods are proposed in this thesis,and the main works are as follows:(1)The protein crotonylation site prediction method SEBP_HNHC is proposed based on a two-layer stack integration iteration strategy.Firstly,in order to make full use of the dataset information,a new data partitioning strategy SED is proposed to partition the training set twice and construct a two-layer training subset.Secondly,nine encoding methods involving sequence information,materialization information and matrix evolutionary features are used for multi-view feature extraction,and to remove redundant information,the optimal feature subset is selected using Gini coefficient and forward feature selection.Based on the first layer of training data,the initial classification probabilities of the samples are obtained by combining the support vector machine algorithm.To weaken the effect of sample imbalance on the prediction performance,the initial classification probability is used as the feature vector of the second layer training data,and the stack integration model is constructed by combining seven machine learning algorithms.Finally,using deep learning and integration learning ideas,the prediction probabilities of multiple optimal feature subsets are used as the feature inputs in the deep layer,and the final prediction results are obtained by integration iterations.The experimental results verify the effective feasibility of SEBP_HNHC.(2)The drug-protein interaction prediction method DMNDTI is proposed based on dual multi-view network learning.Firstly,the dataset is expanded and collected,and ancillary data such as diseases and side effects are integrated to construct a new large-scale dataset.One of the two multi views of DMNDTI is based on meta paths and denoising autoencoders for protein and drug related heterogeneous network learning,while the other is based on multi-channel graph convolutional networks for drug protein similarity network learning.The overall DMNDTI framework consists of four different modules,namely,representation learning of raw drugs and proteins based on meta-path,construction of multi-view drug-protein pair networks,representation learning of three-channel drug-protein pair networks and drug-protein interaction prediction.As verified by experiments,DMNDTI shows good prediction performance. |