Font Size: a A A

Protein Function Prediction Based On The Sequence Circular Relationship Network

Posted on:2016-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:J W GuoFull Text:PDF
GTID:2180330473459922Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the completion of human Genome Project and the development of biological experiment technology in gene era in early twentieth century, huge biological sequence data are emerging. The problems of analyzing these enormous data and extracting useful information from them are challenging. The study of protein function is an important research topic in the field. The protein function is usually determined by traditional biological wet lab experiment, which requires a lot of manpower and costly material resources, and it is also low efficiency. Therefore, it is standard to applying computational methods such as data mining and intelligent computing technology to predict protein function. This paper builds a Protein Circulation Relationship (PCR) network model through the sequence circular permutation matching algorithm, then it utilizes the PCR network to predict the protein function. The contributions of our study are listed as follows:1. It is an improvement to the traditional direct neighbor annotation method. Based on the PCR network, the recommendation algorithm is adopted in this study to predict protein function, it solves the problem of the cold start in the collaborative filtering recommendation algorithm to predict protein function. This is an optimal protein function prediction method, and it is the improvement of the direct neighbor annotation method.2. This paper proposes an improved Markov Clustering (MCL) algorithm for PCR network clustering. The improved MCL clustering algorithm decreases the large number of clustering fragment and increases the slow convergence speed in time complexity. It is more suitable to be used in the PCR to predict protein function.3. A novel method based on ranking the importance of network nodes is also proposed. Considering the importance of protein nodes in the network, this study transforms the undirected PCR network to a directed network, and then the node importance algorithm-PageRank (PR), is used to compute the nodes’ PR value. The proposed method is also developed on the Hadoop Platform, which make it more suitable for huge genome database with greatly efficiency and parallel computing.The proposed innovation strategy of protein function prediction in this study is based on PCR network by using protein sequence database. Our method helps to screen the potential novel proteins for biological lab verification.
Keywords/Search Tags:Bioinformatics, Prediction of Protein Function, Sequence Circulation Matching, Recommendation Algorithm, Clustering Algorithm, Node Importance in Network
PDF Full Text Request
Related items