Font Size: a A A

Research Of Non-homology Computing Method Of Protein Function Prediction

Posted on:2010-12-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q MaFull Text:PDF
GTID:1118360272997319Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the post-genome era, the research focus of Bioinformatics has been transferred from the sequencing to the annotation. With the rapid development of large-scale sequencing tools, large amount of the whole genome has been sequencing. Rely solely on traditional experimental approach to analyze the function of these new sequencing data has been far from meeting the current requirements. Therefore, how to design the functional annotation method that based on calculation to predict the hidden biological function of these massive data becomes an important research topic of current bioinformatics.The computational method of predicting protein function can be divided into two broad categories: the homology based method and the non-homology based method. The basic principles of homology based method start from the sequence and analyze their homology relations through sequence alignment tools, then predict the function of non-annotated sequence from the function known sequence. The advantage of homology based method is simple, effective, but the accuracy is often not satisfactory, so, there is a tendency that the non-homology method may replace it in recent years. The non-homology based method predicts the function of sequence through the sequence attributes, these attributes include evolution distance, codon usage bias and so on. The non-homology based method includes phylogenetic profile method; genes adjoin method; protein-protein interactions and protein evolution rate associated etc. Phylogenetic profile method and protein-protein interactions are mostly used among them; also they have the most research value.Phylogenetic profile is a comparative genomic method that predicts large-scale biology molecule function through evolution information. The principle of this method assumes that such a premise: the proteins that functional interrelated have the same or similar species distribution, in other words, they exist or do not exist in the corresponding organisms. The phylogenetic profile of sequence are using a N-dimensional vector to express, that N means the number of reference organisms,every vector means this sequence or it homology sequence exists in corresponding organisms or not. The selection of reference organisms and measurement of correlation between profiles are two key steps in phylogenetic profile method, the existing phylogenetic profile method all improved from these two aspects.This paper analyses the selection methods of reference organisms, brings forward the thought of construction of weight-based phylogenetic profile, so ensures that it reflect the evolution information between sequence effectively. In addition, the classical K-means clustering algorithm and hierarchical clustering algorithm are improved in this paper, and applied them to measure correlations between profiles. Experiment results show that the weight-based profile and improved clustering alogrithms can effectively improve the performance of phylogenetic profile method.Predicting the protein-protein interaction can be used as another way to study the function of protein. The modern functional genomics consider that protein complete certain function through mutual interactions, so the protein with interactions typically have functional relativity. In this paper, we employ primary structure of protein to predict protein-protein interaction. The statistical method is used to generate several features of sequence, such as hydrophobility, molecular weight, polarity and average area buried, then these features are normalized for satisfying experiments. After SVM and BP neural network are used to classify two kinds of proteins. We used the Scerevisiae yeast dataset that choosen from MIPS to verify the predictive ability of our method, which including 4837 of interaction protein pairs and 9674 of non-interaction protein pairs. Experiments show that SVM and BP neural network all have a higher accuracy, Achieving above 64% accuracy rates using SVM, and above 87% accuracy rates using 10-fold cross-validation based on BP neural network, thus they all can be used for predicting protein-protein interaction.This paper do deep research and improvement of two main non-homology methods of predicting protein function. In phylogenetic profile method, we choose organisms that have far evolution distance as reference organisms, after construct the weight-based profiles, at last use two improved clusting algorithm to analyse the relativity of protein profiles; in protein-protein interaction method, we select SVM and BP neural network to classify proteins. Experiments indicate that the method we select and improved all have higher sensitivity and specificity in predicting protein function.
Keywords/Search Tags:Protein Function, Non-homology, Phylogenetic Profile, Weight-based Profile, Protein-Protein Interactions, Protein Primary Structure
PDF Full Text Request
Related items