Font Size: a A A

Protein-protein Interactions Estimation Based On Latent Factor Analysis Of High Dimensional And Sparse Matrix

Posted on:2018-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:J P SunFull Text:PDF
GTID:2310330536969191Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Confirm the interaction between two proteins and understanding the effect from this interaction is the way to understand vital movement.Building a complex network by the data of protein-protein interaction,can provide useful and comprehensive information for find the secret of life.Although many researchers confirm the interactions between the proteins in model organisms by biological experiments.However,such experiments progress has proved slow,which make it impossible to carve out a complete network of protein interaction.With the technology of data mining become more and more mature.Many researchers have applied the algorithm of data mining to the protein analysis task in bioinformatics,and predicted the protein-protein interaction,and the result of prediction can guide the experiment.The existing methods dealing with protein-protein interactions such as clustering,Bayesian,etc.,is simple algorithms with low accuracy and efficiency,these model deal with small data and hard to analysis high dimension and sparse data.Recently,the number of known proteins in biological data becomes very huge.And there is large number of unknown interactions between them.Such data relate to a high-dimensional sparse networks.Those algorithm is difficult to predict accurately on such data,and cost too much time.However,the Latent factor model can analyze high-dimensional sparse networks well,because it only depends on the known data.In this thesis,we first introduce the Matrix Factorization(MF)model based on singular value decomposition(SVD).Then give the general Latent Factor(LF)model.The LF model can dealing with High Dimensional Sparse(HiDS)matrices.But the result may be overfitting through the basic LF model.And basic LF model usually dealing with bipartite,while PPI network is undigraph.So we do a variety of improvement for LF model,so that it can be more accurate and more efficient to the task of predicting protein-protein interactions.The main work of this thesis is as follows: 1)Base on the general LF model,we add non-negative constraints through non-negative multiplication update principle;add regularization to the objective function to avoid overfitting;add frequency weighting.The obtained novel LF models,have improved their accuracy and efficiency.2)Focus on the characteristics of protein-protein interaction data,a Symmetric Non-negative LF(SNLF)model is developed,which according to the principle of Symmetric Matrix Factorization model.And the linear bias is considered to improve the accuracy.We compared and analyzed the obtained LF models.At last,we get the result that SNLF have better performance in the task of PPI Prediction than other compared models.3)Based on the results,a graphical interface display system for protein-protein interaction prediction is implemented by Java.
Keywords/Search Tags:Protein-protein Interaction, Prediction, Latent Factor, High Dimensional Sparse Matrix, Symmetry
PDF Full Text Request
Related items