Font Size: a A A

Essential Protein Identification Via Graph Structure And Multi-source Information Fusion

Posted on:2022-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z J FeiFull Text:PDF
GTID:2480306491453354Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The essential protein is an important material basis for cell's living,and its absence can cause disease or even death of the organism.Accurate and efficient identification of essential proteins can be used to understand cell functions and provide important guidance and ideas for drug design.In recent years,high-throughput technologies has developed rapidly,and a large amount of protein-protein interaction(PPI)data has been mined.Many essential protein identification methods based on PPI networks have been proposed.These methods can be roughly divided into two categories: one is network-based topological characteristics,the other is the fusion of other protein biological information such as subcellular location.Measuring the importance of a protein not only depends on its topological characteristics,but also includes biological information that can reflect the biological characteristics of the protein.Therefore,how to effectively integrate multiple biological information and network topology information to improve the recognition rate is still an urgent problem to be solved.In response to the above problems,this paper,based on in-depth analysis of network topology and biological characteristics,fused multi-source biological information such as subcellular location,and proposed two different essential protein identification methods.The main research contents are as follows:(1)Multi-Source Fusion Essential Protein Identification Algorithm Based on Weighted Subnetwork Participation DegreeAiming at the problem that the existing essential protein identification accuracy rate needs to be improved,a high-efficiency essential protein identification method(Participation Degree of a Protein in Multiple Data Source Weighted Subnetwork,PDWS)is proposed.First,fusion protein subcellular location information and edge clustering coefficients,based on the network topology characteristics and the importance of the subcellular location of the protein,assign weights to the PPI;secondly,on the construction of the weighted PPI network,propose the subcellular location compartment subnetwork participation and protein complex subnetwork participation index measures the importance of proteins in the network;finally,integrate the characteristics of the two subnetworks to design a essential protein identification method PDWS.The experimental results show that in the two datasets of DIP and Krogan,the recognition accuracy of this method reaches 76% and 73%,respectively,which are higher than other methods.(2)Identification of Essential Proteins Based on Local Functional Density Via MultiSource Information FusionAiming at the problem that the existing methods for identifying essential proteins do not fully consider biological information,and the efficiency of identifying essential proteins with low connectivity not high,a new essential protein identification method(Local Functional Density Via Multi-Source Information Fusion,LFDI)is proposed.Based on the two kinds of biological information fused by PDWS,this method integrates GO annotations and gene expression profile information,and proposes local functional density center indicators,protein complex participation and subcellular localization scores to measure the importance of proteins in the network importance.In order to verify the performance of LFDI,two PPI datasets,DIP and Krogan,were tested.The experimental results show that compared with the recent PCSD method,the LFDI method can identify some essential proteins with low connectivity and has a higher recognition rate.Aiming at the problem that the recognition accuracy of existing essential protein recognition methods needs to be improved,this paper proposes a new recognition method.The method proposed in chapter 3 mainly integrates two kinds of biological information,and uses the topological features in the constructed weighted subnetwork to identify essential proteins,so the method is less dependent on network reliability.The chapter 4 integrates more biological information to construct a weighted network,proposes local functional density as an evaluation indicator,and evaluates protein importance by integrating three indicators of network topology characteristics,subnetwork topology characteristics,and subcellular scores calculated using only biological information.This method considers network topology and biological information more comprehensively.In a network with complete biological information,this method can be used to identify essential proteins.The experimental results confirm that the two proposed methods are effective in identifying essential proteins and have high recognition accuracy.
Keywords/Search Tags:computational biology, protein interaction network, essential protein identification, multi-source information fusion, complex network
PDF Full Text Request
Related items