| With the development of bioinformatics,there is an increasing demand for the explanation and research of protein function in modern society.Most proteins perform their functions by interacting with other proteins.Therefore,the exploration of protein-protein interactions(PPIs)has also attracted extensive attention.At present,the density of known relationships in PPIs data is very low,which cannot meet the needs of the practical application of life science.A large number of PPIs need to be explored.However,due to the time-consuming and high cost of biological experiments,the existing PPIs data obtained by experimental means are highly sparse.In addition,the importance of discovering key PPIs is higher than predicting all unknown PPIs.Therefore,this paper will focus on the prediction of key protein-protein interaction relationships,and propose algorithms based on variational autoencoder(VAE)to represent and learn high-dimensional sparse protein-protein interaction network(PPIN).The main research contents are as follows:Firstly,this paper preprocesses the data by setting the threshold and constructs an undirected key PPIN.Based on the VAE with Gaussian distribution,the information weighting parameters are introduced to make the model better express and learn the high-dimensional sparse and incomplete PPIN data.A regularization coefficient is introduced into the original loss function of VAE,and the annealing method is used to update this coefficient,which makes VAE more suitable for predicting key PPIs.The results show that the algorithm proposed in this paper shows good prediction performance in key PPIs prediction problems.Then,based on the good scalability of VAE,this paper combines VAE and bagging ideas,and proposes a data division and probability summation architecture,to train multiple base predictors,and integrate multiple base predictors into a strong key PPIs prediction model through the method of probability summation.This paper extends the model structure of VAE to improve the accuracy and robustness of the algorithm,and effectively reduces the computational cost of the original method through distributed computing.Finally,aiming at the defects of the traditional super parameter optimization method,this paper applies particle swarm optimization algorithm to the super parameter optimization of the model,so that the VAE model realizes the super parameter adaptation.This updating method reduces the time cost of parameter adjustment,makes the model dynamically adjust the super parameters in the training process,and makes the model optimize in the optimal direction faster in the training.To sum up,this study proposes a key PPIs prediction algorithm and its improved algorithms,which can effectively deal with high-dimensional sparse and incomplete PPIN data.The predicted key PPIs information can not only provide a basis for mining new PPIs,but also help people understand the function of proteins and their responsibilities,and contribute to disease diagnosis and pathological research. |