Font Size: a A A

Study On Protein Function Prediction Based On Random Walk

Posted on:2013-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L DengFull Text:PDF
GTID:2230330371983435Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the sequencing of the human genome was completed, the field of genetics nowstands on the important period of significant theoretical and practical advances. Key to furtherstudy is a comprehensive understanding of the expression, function, and regulation of theproteins encoded by an organism. This is the subject of proteomics research. Proteomicsencompasses a wide range of approaches and applications, such as to explicate the complexityof biological processes occur at the molecular level, differences in a variety of cell types, aswell as the conversion in the disease state. Protein function prediction is one of the veryimportant directions in the proteomics.According to the investigation, the proteins rarely perform alone to their functions in theorganism. The analysis of proteins function shows that there is interaction between theproteins of the same cellular processes. The function of unknown proteins may be predictedbase on their interaction with a known protein. Protein interactions have not only predictedprotein function but also model the functional pathways, to reveal the molecular mechanismsof cellular processes. The research of protein interaction is fundamental to understand thefunction of proteins in the cell.In recent years, with the advent of high-throughput method to generate a large number ofprotein-protein interaction data, such as two-hybrid system, mass spectrometry and proteinchip technology. So that we can establish a genome-scale protein interaction networks fromthese heterogeneous data sources. However, quite a lot of proteins in such networks remainunknown and predicting the function of unknown proteins remains a major challenge.At present, hypothesis of many existing techniques is that proteins with similar functionsare topologically adjacent in the interaction network. We assumed that proteins with similarfunctions have similar annotation patterns, regardless of the distance between them in theprotein interaction network. By comparing their similarity of annotation pattern to proteins ofknown function, we can predict functions of unknown proteins.Protein function prediction is a multi-label learning problems,the training sample set iscomposed of protein each associated annotation label set, and the multi-label learning task isto predict the function label sets of functions of unknown proteins by analyzing the training protein with known function.We propose a three-phase approach. First, the random walk algorithm is used to extractannotation pattern of a protein. The random walk algorithm finds proteins that in closeproximity to initial protein in the network. The next step is the transformation of aneighborhood pattern into annotation pattern. Then, for each unknown protein, we employtraditional K-Nearest Neighbor algorithm to identify the k nearest neighbors’ protein from theprotein training set based on annotation pattern. Finally, based on statistical informationobtained from the classification label sets of these neighboring proteins, for example thenumber of neighboring protein belonging to each class, the maximum a posteriori are used todetermine the class set for the function of unknown proteins.The result,which is obtained by testing on the data sets constructed for yeast protein,show that the method can effectively predict protein function.In this work, the future can also be from the following aspects in-depth study:1) Try to combine other information of the protein, such as protein domain and theprotein sequence, for the extraction of annotation pattern.2) Try to use or combine with other multi-label classification algorithms, such asBOOSTEXTER, RANK-SVM.3) Go into consider the level information of FunCat.
Keywords/Search Tags:Protein Function Prediction, Protein-Protein Interaction Network, Classification, Random Walks, KNN, Multi-label learning
PDF Full Text Request
Related items