Font Size: a A A

An Essential Protein Identification Method Based On Fusion Of Multiple Data Sources

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2370330602499823Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Essential proteins play an indispensable role in cell survival and development.Essential proteins can be identified by biological experimental methods and computational methods.The former has the advantage of being able to accurately identify essential proteins,but it is time-consuming,expensive and inefficient.The latter has the advantage of being able to quickly identify essential proteins.In order to overcome the shortcomings of biological experimental methods,many computational methods have been proposed successively.These methods can be divided into two types: topology-based computational methods and bioinformatics fusion computational methods.The former uses the topological characteristics of protein interaction networks,while the latter uses multiple data sources such as fusion protein complexes to identify essential proteins.Due to imperfect protein interaction data and other reasons,the accuracy of existing essential protein recognition methods is low,so how to improve the recognition accuracy is still a challenging task.In this paper,based on the topological properties of protein interaction networks,fusion of protein biological information,an effective method for identifying PSHC and PSLC of essential proteins is proposed.PSHC is a multi-data source fusion essential protein recognition method based on structural hole theory and protein complex information.First,the PSHC method introduces the structural hole theory into the essential protein recognition method for the first time;second,it fuses two data sources,protein interaction network and protein complex,to identify essential proteins.The experimental results on the DIP and Krogan protein data sets show that,compared with other traditional methods,the PSHC method can identify more essential proteins,with higher recognition accuracy,and sensitivity,specificity,accuracy,positive predictive value,Statistical indicators such as negative predictive value and F measure are also significantly higher than other methods.Therefore,the essential protein recognition method that combines structural holes and protein complex information is effective.PSLC is a multi-data source fusion essential protein recognition method based on subcellular localization,protein complexes and protein interaction networks.The PSLC method considers not only the internal degree of the protein complex,but also the external degree of the protein complex;secondly,it integrates subcellular location information to evaluate the importance of the protein.The experimental results on the DIP and Krogan protein data sets show that compared with other traditional methods,the PSLC method can identify more essential proteins,and the sensitivity,specificity,accuracy,positive predictive value,negative predictive value,F measure Other statistical indicators are also significantly higher than other methods.
Keywords/Search Tags:Protein interaction network, Structural holes, Protein complex, Subcellular localization, Essential proteins
PDF Full Text Request
Related items