Font Size: a A A

Algorithm Design And Implementation Predict Protein Function Based On The Random Walk

Posted on:2015-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:C C JiaFull Text:PDF
GTID:2260330431956579Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the post genomic era, an important task is to predict the function of protein.With the rapidly development of high-throughput experimental techniques, more andmore data are produced by protein interacting with each other, according to theseinteraction data, they constructed a lot of protein interaction networks. In this paper, weutilize access data to integrate protein interaction network. But only a small part of theprotein’s functions are known. There are also many proteins whose functions areunknown. For these proteins, we can predicate the functions of them according to theproteins which are known and can interact with them.At present, many existing techniques are assumed that in the interaction network,adjacent proteins have similar functions. However, our algorithm assumed that in theprotein interaction network, no matter whether the proteins are adjacent or not, as longas they have similar functions, we consider that they have similar annotation model.We utilize the known functions of the proteins to compare the annotation model’ssimilarity that is to predict the unknown functions of proteins. Based on the aboveassumption, we can take the problem of predicting functions of proteins as a multilabel classification problem, that the proteins’ known function annotation tag set iscomposed of a training sample set, while the proteins’ unknown function annotationtag set is composed of the prediction sample set, we compare annotation modelbetween the training sample set and the prediction sample set to realization theprediction of the unknown function of protein.Based on this concept, in this paper we present a prediction method which isbased on random walking, this method not only considers the local network topologybut also considers the global network topology, in this algorithm, we put the knownfunctions of proteins as a starting point, for the neighborhood information which are produced by random walking in the protein interaction network, we convert it to theannotation schema information; Then we use the traditional KNN algorithm to find thek nearest neighbors of the unknown functions of the proteins from the training sampleset; Finally, we combined with KNN algorithm for multi label classification to statisticthe number of the k most adjacent protein functional class; we predict which proteinshould be subject to the corresponding function label class that is based on themaximum posteriori probability for the unknown functions of proteins.We use type I diabetes access data to construct protein interaction network forpredicting protein function experiments, the results show that the method proposed inthis paper can effectively predict the function of protein.
Keywords/Search Tags:Prediction of protein function, Protein-protein interaction networks, Random walking, Protein function annotation, KNN, Multi label classification
PDF Full Text Request
Related items