Font Size: a A A

Research On Detecting Protein Complexes Based On Supervised Learning And Unsupervised Learning

Posted on:2021-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2428330647461964Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Proteins are foundation of life activities,and protein complexes are the form of proteins to perform specific biological functions.Therefore,the detection of protein complex has great significance to biological research.There are many disadvantages of protein complexes detection by traditional biological experiments such as expensive cost.In recent years,with the increasement of protein interactions data and protein-protein interaction network(PPI)model is proposed,how to detect protein complexes from PPI network by computational method has become one of the hot spots in the current research.The computational method can be classified as unsupervised learning and supervised learning at present.Unsupervised learning uses the theory of graph to analyze the topological features of protein complexes in order to detect protein complexes.However,due to the existence of a lot of noise in the real protein-protein interaction network,there is an effect on the analysis of the protein complexes topology,which leads to the error of unsupervised learning algorithm in the process of protein complex detection.The main idea of supervised learning algorithm is to use the information of known protein complex,to learn the commonality of the complex by supervising learning model for the detection of protein complex.However,the incompleteness of the known protein complex and the false positive in data set,only rely on the data of known complex is a certain irrationality and has effect on the accuracy of detection.In this paper,the DCRA algorithm is proposed for the shortcomings in the unsupervised learning algorithm,which reduces the influence of noise node during the detection process.Aiming at the problems in the supervised learning algorithm,the XGBP algorithm is proposed,which combines the supervision learning algorithm with the topological information of protein complex.The experiment proofs XGBP improves the accuracy of protein complex detection.The main contributions of the paper include:(1)Aiming at the noise and uncertainty in PPI network,a method detecting complexes based on uncertain graph by removing articulation points(DCRA)is proposed based on uncertain graph model.By removing articulation points DCRA reduces the impacts of noise.Compared with other unsupervised learning algorithms,there is an improvement in accuracy by experimental verification.(2)During DCRA,two parameters are proposed.By adjusting the two parameters,the relationship between sensitivity and accuracy can be effectivelycontrolled.The parameters can be adjusted to meet the needs of different scenarios.(3)Aiming at the problems in supervised learning,a method based on XGboost and topology structural information(XGBP)is proposed.The algorithm not only relies on the existing protein complex data set for complex detection,but also combines the topological features of the protein complex,effectively making up for the incompleteness of the data set.In model selection,the currently popular XGboost model is selected.Experiments show that the algorithm has great advantages in the accuracy of protein complex detection with the popular supervised learning algorithm.
Keywords/Search Tags:Protein Complexes Detection, Uncertain Graph, XGboost Model, Graph Data Mining
PDF Full Text Request
Related items