Font Size: a A A

Research On Essential Proteins Prediction Algorithms In Protein Interaction Networks And Its Applications

Posted on:2017-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y QiFull Text:PDF
GTID:1360330488977069Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Proteins is the basic components of living organisms.To clarify profoundly the structures and functions of proteins is a key task to explore the mystery of lives.Relative studies have revealed that the importance of proteins for the survival of organisms is different,which can divide proteins into essential proteins and non essential proteins.With the rapid developments of high-throughput biological experiment technologies to detect mutual relationships among proteins,the available massive protein-protein interactions data have been produced.And it is become a new researching hotpoint of the relative fields such as biology or medicine disiplines to develop essential protein prediction researches at the view of networks.On the other hand it is a rule of experience in molecular biology that is structure determining function so that research of essential protein helps deepen understanding protein functions in living organisms.For reasons mentioned above,research on essential protein prediction algorithms have been conducted in this dissertation at the level of protein interaction networks and the mainly study contents of the dissertation have included:algorithms of essential protein prediction based on network topology centralities,multi protein information fusion mechanisms and automatic adjustment mechanisms of prediction algorithms in front of different network structures.Meanwhile this dissertation has also made studies on the identification of protein complexes based on the research results of the essential protein prediction algorithms.The main works and contributions of this dissertation has been summarized as follows:(1)Essential protein prediction algorithms based on topology centralities of network nodes is a kind of important predicting methods.At the same time,this dissertation also has found designing ideas of existing centralities prediction algorithms concentrated in characteristics minings of essential proteins on protein interaction data sets and ignored the studies of associations of structures between essential proteins and protein complexes so that the predicting results of existing algorithms are ususally not good.Because of these two factors above,the systematical analyses have been made in this dissertation to discuss about the topological correlations between real protein complex data sets and essential proteins,and then proposed a new essential protein prediction algorithm based on local interaction density of network nodes,named LID.Some comparative experiments have been made between LID and the existing classical network topology centralities prediction algorithms and the relative results indicated the new algorithm has better prediction results.(2)From the point of view of existing research results,it is very difficult to design essential protein prediction algorithms based on a single topology feature of protein networks for better performances.Therefore it is a natural choice to research new prediction algorithms based on multi protein information fusion in relative fields.On the existing multi information fusion essential protein prediction algorithms,their fusion mechanisms usually are through the setting manually of experience parameter values.This kind of fusion mechanisms requires a large number of experiments to obtain available values of parameters,and moreover parameter values can not be changed if set so that existing multi information fusion essential protein prediction algorithms generally can not self-adjusting,which has reduced the adaptability of existing prediction algorithms.Hence this dissertation puts forward a new mechanism of multi information fusion,and the local interaction density LID of network node proposed in this paper can be fused with in-degree information of nodes in real protein complexes based on this new fusion mechanism,and then to construct a new multi information fusion essential proteinsprediction algorithm,named LIDC.The new fusion mechanism of LIDC does not require manual empirical parameters,and it is able to adjust itself in several different protein interaction networks in some levels.New algorithm LIDC has achieved better predicting results with more adaptability compared to some existing classical multi information fusion prediction algorithms and the new algorithm LID under several evaluation measures so as to provide a way to study protein multi information fusion mechanisms.(3)Existing essential protein prediction algorithms based on network node topology centrality lack components of self regulations.This dissertation thinks that it is possible to make the existing prediction algorithms with network topology centralities to possess a certain degree of adaptability through reasonable designing of the automatic adjustment mechanism within as much as possible to reducing the dependence on protein biological information to carry out prediction tasks protein interaction data sets who contain different network structures.Thus this dissertation have found through relative researches that there is a network topology characteristic named LIDH,which is the heterogeneity index of local interaction density and has a correlation to differences among network strutures that can be used to guide the self adjustment of prediction algorithms.And then the expansion of network node local interaction density algorithm LID has been proposed here based on LIDH,which is local generalized interaction density protein prediction algorithm,named G-LID so as to construct a prior network set in protein interaction data sets whose core is the algorithm G-LID.The algorithm does not require manual empirical parameters,and it may adjust itself on several protein interaction networks.Meanwhile the algorithm has taken use of a priori knowledge,the relevant priori knowledge has been still from protein interaction data sets so as to not increase protein biological information types and the dependence of data.Compared to the existing classical network topology centrality prediction algorithms and our new algorithm LID in those protein interaction data sets in which previous prediction performances of these algorithms mentioned above has declined faster,the algorithm has obtained better effects of improving prediction performances related to algorithm LID,and more has provided an adaptive mechanism of various network structures for essential protein prediction algorithm research based on network topology centrality.(4)It is well known that the general form of proteins in living organisms to carry out some biological functions is the cooperation of many proteins.Protein complexes is an objective embodiment of this kind of cooperation among proteins.Therefore it is important to understand complex survival mechanisms of living organisms in the view of protein interaction networks.The protein complex prediction algorithms published before have taken use of clustering idea mostly to predict protein complexes in protein interaction networks,and their clustering mechanism focus on generally how to partition network nodes in mathematical sense while they have a higher time complexity.On the other hand,the present biological experiments have found that there are more micro structures inside real protein complexes in which proteins can also be partition to core members and affiliated members.Inspired by these facts,this dissertation has put forward a new protein complex prediction algorithm based on local interaction density of network nodes,named CBLID,which may be as some kinds of popularizations and applications for our results of essential protein prediction algorithm research in the protein interaction network.This algorithm firstly select seed nodes of clusters according to the local interaction density scores of network nodes,and then interaction adjacent nodes of the current seed will be assigned to the corresponding cluster belong to current seed so that clustering process has been completed,and delete redundant clusters in the cluster set of CBLID to obtain a protein complex candidate set of the current protein interaction network.In the algorithm,CBLID has the smaller time complexity compared with the existing classical protein complex prediction algorithms.Moreover this algorithm has achieved better predicting results and more biological enrichment in the multiple protein interaction networks under several evaluation measures so as to provide an idea for protein complex prediction research.
Keywords/Search Tags:Bioinformatics, Protein Interaction Network, Multiple Information Fusion, Adaptive Adjustment, Essential Protein, Protein Complex
PDF Full Text Request
Related items