Font Size: a A A

Identify Essential Protein And Protein Complex Algorithms On Protein-proteinInteraction Networks

Posted on:2022-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2480306608489864Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Protein is the material basis of organisms,and the activities of organisms are inseparable from the functional performance of proteins.If a protein tends to exert its biological function,it can not only use a single protein as the essential protein,the basis for the survival of the organism but also complete the biological function in the way of multiple proteins aggregated to form a protein complex.From the perspective of individual proteins,proteins can be divided into two categories,essential proteins,and non-essential proteins.If an organism lacks essential proteins,it will cause death or disease.Therefore,accurate identification of essential proteins can help the study of cell functions.And provide extremely important guidance for drug design.From the perspective of protein groups,most single protein individuals cannot carry out biological activities independently and must combine with other proteins to form protein complexes to achieve biological functions,to achieve biological functions and as carriers of biological processes,thus effectively identifying protein complexes It can not only deepen the understanding of biological organization principles and functional mechanisms in biology but also diagnose and treat various diseases.With the rapid development of high-throughput technologies,massive Protein-Protein Interaction(PPI)data have been excavated,providing a data basis for essential protein identification and protein complex identification methods in protein interaction networks.In addition to biological experiments,early computational approaches for identifying essential proteins and protein complexes mainly utilize Protein-protein Interaction Network(PIN)topological features.Biological data are increasingly rich with the development of biological experiments,and the identification of essential proteins and protein complexes algorithms integrated with other biological information have also been developed.As research evolves,scholars have found that the performance of identification methods depends not only on the use of topological features,but also on methods combining information on protein biological properties.Existing essential protein identification methods need to be improved for high-order neighbor structure analysis of protein nodes and strategies for integrating multiple biological information and network topology information.Existing research on protein complex identification methods is lacking in the analysis of the interaction and connection between protein complexes and essential proteins,and the lack of analysis of the relationship between individual proteins and the overall composition of the complex.Regarding the above existing problems,this research investigates how to identify essential proteins and protein complexes based on real protein interaction networks.A brief introduction to the research status of essential proteins and protein complexes,topological features of protein interaction networks,biological features related to identification algorithms and current mainstream identification methods.The description of the essential protein and protein complex identification problems provides the foundation for the design of different essential protein and protein complex identification methods and provides the theoretical basis.With the topology of protein interaction network and biological information data,the essential proteins are studied by using the network topology theory and data fusion method;On the premise of in-depth analysis of network topology and biological characteristics,the external connection between protein complexes and essential proteins is searched,and the connection between protein nodes and the whole complex is calculated based on the core-attachment structure design method to identify protein complexes.The main work and innovation points of this paper are as follows:(1)Aiming at the problem that the identification accuracy of existing essential proteins can still be improved,based on the correlation analysis of the node h-quasi-cliques topology and criticality in PIN,and other biological information other than PPI and PIN fusion or separate measurement of protein criticality,proposed h-quasi-cliques and Fusion of multiple data source(QCF).This method evaluates the influence of the topological structure of h-quasi-cliques on the criticality.After constructing a new network,the topological properties are calculated on this basis,and the bioinformatic metrics are combined to identify essential proteins from multiple perspectives.First,the QCF method combines PIN and gene expression profiles to construct dynamic PINs to reduce the influence of noise in static networks;secondly,in dynamic PINs,topological features and protein functional annotations are combined to calculate protein topological scores;finally,fusion topological scores and Three protein biological information scores to calculate protein criticality.To verify the performance of QCF,16 methods including MON,TEGS and LBCC are tested and compared on 3 datasets.The results show that QCF has good recognition performance on performance indicators such as the number of recognitions,F-measure and ACC.The average prediction accuracy of the top 100 and the top 600 is 88.3%and 67.7%,the average F-measure is 0.5674,and the average Acc average is 0.7581,which is better than other methods.(2)Most of the existing methods only search for local topological information,mining dense subgraphs as protein complexes,while ignoring the intrinsic composition of protein complexes.To solve these problems,the Core Attachment and Essential Protein method(CAEP)is proposed.The method detects protein complexes by combining nuclear-appendage structures on a dynamic protein interaction network,weighted using essential proteins and GO annotations.First,define the method of assigning weights to protein interactions,adjust the weights with essential proteins and other biological information,assign weights to dynamic protein interactions,identify the nuclei of protein complexes based on preset fixed structures and common neighbors,and use the nuclei of based on the identification of accessory proteins.Finally,the identified protein complex cores and accessory proteins are combined to form protein complexes and processed redundantly.To evaluate the efficiency of the method,CAEP was compared with nine other identification methods on two yeast datasets,DIP and BioGRID.Experimental results show that CAEP outperforms the compared recognition methods on the performance detection metrics of precision,recall,F1 and Acc.Compared with the COACH method under the DIP dataset,the four performance indicators on the standard protein complex dataset NewMIPS have an average increase of 15.53%,and an average increase of 15.03%on CYC2008.The two identification algorithms proposed in this paper are compared with other existing algorithms of the same type on different PPI databases,the results fully demonstrate that the two types of recognition algorithms proposed in this paper have excellent performance.At the same time,in terms of essential proteins,this paper also analyzes some phenomena that appear on the protein interaction network during the identification process,and specifically discusses under which conditions proteins are critical;in terms of protein complexes,the analysis of many identified The similarity between protein complexes proves that they may be real protein complexes and are irreplaceable.In addition,the proposed algorithm for identifying protein complexes and essential proteins has extensive application significance in the research of target identification and classification and clustering in complex networks with similar community structures,and it is a method that can be used as a practical tool.
Keywords/Search Tags:computational biology, protein interaction network, essential protein, protein complex, multi-source information fusion
PDF Full Text Request
Related items