Font Size: a A A

Research On Algorithm Of Identifying Protein Complexes And Essential Proteins On PPI Network

Posted on:2021-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiuFull Text:PDF
GTID:2370330611463422Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information technology of the biological crossover network and the continuous increase of a variety of biological genes,a large number of protein databases have brought,making the hot research topic of biological to analysis protein network functional expression,action environment,the impact of the production,composition structure.Especially,the discovery of protein complexes and essential proteins from protein interaction networks is of great value in the study of the mechanism of biological diseases and drug development.In recent years,although there are many breakthroughs in research of protein complexes and essential proteins,many protein data are obtained by a variety of high-throughput techniques.Because of the complexity,unreliability and small-world characteristic of the protein interaction network,as well as the limitations of experimental measurement,there is a high proportion of false positives and false negatives in the obtained data,resulting in low recognition accuracy.In addition,many algorithms don't distinguish between protein complexes and functional modules.Therefore,there are many challenges to accurately mine protein complexes and essential proteins from PPI networks.This paper proposes fuzzy ant colony clustering algorithm,modularity function,fuzzy spectral clustering algorithm and mining algorithm based on complex participation and density to mine protein complexes and essential proteins.In this paper,it mainly starts from two aspects.On the one hand,based on the complexity of the PPI network itself and the defects of protein data,weighted protein network and the uncertain network are constructed by combining the topology and biological information of the network.On the other hand,aiming at the defects of traditional module mining algorithms,fuzzy ant colony clustering algorithm,modularity function and fuzzy spectral clustering are proposed to make up for the shortcomings of traditional algorithms.At the same time,some improvement strategies are proposed to optimize some problems of these algorithm themselves.And these improved algorithms are used to mine protein complexes.On the basis of the mined complexes,essential proteins are identified.The main studies of this paper are as follows:Aiming at the problem that the complex recognition effect is affected by false positives,massive merger,filter,repeat pick-up and drop-down operation in ant colony clustering algorithm,and the FCM clustering algorithm is sensitive to cluster centers and the number of clusters,the membership function is updated slowly,and the objective function is only considering inter-clustering variation,resulting in the accuracy,recall and time performance of predicting protein complexes.To deal with these problems,this paper proposes a weighted protein complex recognition algorithm based on fuzzy ant colony clustering,named FAC-PC.Firstly,the edge clustering coefficient and the Pearson correlation coefficient are used to construct weighted protein network.Two formulas for selecting the essential proteins and essential group proteins are designed and the essential group protein is used to replace the seed node.The weighted similarity measure is used to optimize the pick up and put down probability of the ant colony algorithm and then simulate the ant colony clustering process to initialize the FCM algorithm.As the same time,the membership update strategy and the objective function which takes a balance between intra-clustering and inter-clustering variation are proposed to optimize FCM algorithm.Finally,the protein complex is identified by improved FCM algorithm.The experimental results show that this algorithm can obtain more accurate clustering results than other protein complex mining algorithms.For the protein complex mining algorithm based on the modularity function,it only analyzes the topological characteristics of the network without considering biological information,it is difficult to identify overlapping and small-scale complexes,and the experimental result is easily affected by false positives and noise data.To solve the problems of low accuracy,low recall and low execution efficiency,a weighted protein complex mining algorithm based on modularity function is proposed,named IWPC-MF.The weighted protein network is constructed by using the edge clustering coefficient,point clustering coefficient and Pearson correlation coefficient.The seed node is selected according to the weight of nodes and the neighbor node of the seed node is traversed.Furthermore,the similarity measurement and protein attachment degree between nodes are designed to obtain the initial clustering module.Finally,the modularity function based on tightness is used to merge the initial module and complete the complex recognition.Comparative analysis shows that this algorithm can identify protein complex more accurately.Aiming at the problem that the PPI network protein complexes mining method based on spectral clustering and FCM clustering has low accuracy and low running efficiency,and is susceptible to false positive,a method for protein complexes mining in uncertain PPI network based on fuzzy spectral clustering is proposed,named FSC-PC.Firstly,an uncertain PPI network is constructed by using edge clustering coefficient.Secondly,based on edge clustering coefficient and flow distance,the similarity calculation of spectral clustering is modified using flow distance of edge clustering coefficient strategy to overcome the sensitivity of the spectral clustering to the scaling parameters.Then the spectral clustering algorithm is used to preprocess the uncertain PPI network data,reducing the dimension of the data.Thirdly,density-based probability center selection strategy is designed to obtain the initial cluster center and clustering numbers of FCM algorithm,and the clustering center and membership degree are constantly updated to obtain the protein complex.Finally,an improved EDD is used to filter out protein complexes.The experimental results show that this algorithm is more accurate than other complex prediction algorithms.For the essential protein recognition method based on PPI network,only the network structure is concerned,and the neighborhood information of protein nodes in the clustering process of the essential protein recognition algorithm based on the complex information and the mining effect on the recognition of protein complexes proteins are not fully considered,resulting in the recognition low accuracy and low specificity.By comprehensively considering the topological characteristics and biological information of the protein network,this paper proposes a method based on the participation and density of the complex,named PEC.In this method,the GO annotation information and the edge clustering coefficient are fused to construct the weighted protein network.The maximum difference between eigenvectors and protein node degree are designed to determine the partition number and initial clustering center of the FCM algorithm,and then the FCM algorithm is used to mine the complex.Furthermore,essential proteins are mined by essential node scores based on complex participation and node neighborhood sub-graph density.The experiments show that this method has better accuracy than topological centrality method and the recognition method based on complex information.
Keywords/Search Tags:protein-protein interaction network, protein complex, essential protein, protein recognition
PDF Full Text Request
Related items