Font Size: a A A

Research On Protein Function Prediction Algorithm Based On Network Analysis

Posted on:2019-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Q TangFull Text:PDF
GTID:2370330566475960Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Proteins are macromolecules that involve in most vital biological activities in organism.It is of significance to recognize their functions to promote the development in such fields as life sciences,agriculture,medical treatment,etc.Since traditional biological experiments,which is both laborious and inefficient to determine proteins functions,i.e.,consume a lot of manpower,material and financial resources,cannot meet the increasing number of protein sequences,it is necessary to predict protein function by using computational methods,which can provide theoretical guidance for the biological experiment and reduce the experimental cost.With the development of high-throughput biological experiment technology,a substantial amount of PPI(Protein-Protein Interaction)data has been produced and more and more researchers pay attention to the protein function prediction method based on PPI network,which has become a hot spot in bioinformatics in the post genome era.In this dissertation,the problem of protein function prediction based on PPI network is studied,the main contents are as follows:Firstly,a machine learning based approach HPMM was proposed,which combines Hierarchical Clustering(HC),Principal Component Analysis(PCA)and Multi-layer Perception(MLP).HPMM takes comprehensive consideration from macro and micro perspectives.It combines the information of protein families,domains and important sites into the vertex attributes of PPI networks to alleviate the effect from the data noise of networks.The features of function modules and principal attributes component are extracted by using HC and PCA first.Then,a mapping relationship between multi-feature and multi-function,used to predict protein functions,is constructed by training the MLP model.Three homo sapiens PPI networks annotated by molecular functions(MF),biological processes(BP)and cellular components(CC)respectively,were adopted in the experiments.Comparisons were performed among the HPMM algorithm,the Cosine Iterative Algorithm(CIA)and the Diffusing GO Terms in the Directed PPI Network(GoDIN)Algorithm.The experimental results indicated that HPMM can obtain higher micro-accuracy,micro-precision and micro-F1 than algorithms CIA and GoDIN,which are purely PPI network based methods.Secondly,the BiWV algorithm was proposed to predict protein functions,which combines the global topological similarity produced by Random Walk with Resistance(RWS)and thesemantic similarity between terms.In addition,the Bi-Weighted Vote algorithm with pathway(BiWV-P)was presented by integrating the information of biological pathway.By using the data sets of saccharomyces cerevisiae and homo sapiens,experiments were performed to compare the Transductive Multi-label Classifier(TMC),the Unbalanced Bi-Random Walk(UBiRW),the Protein Function Prediction by Random Walks on a Hybrid Graph(ProHG),the BiWV and the BiWV-P algorithms.The experimental results indicated that the BiWV and the BiWV-P algorithms can predict protein functions effectively,and achieve higher micro-accuracy and micro-F1 than other algorithms in many data sets.In summary,in this paper,methods of protein function prediction based on network analysis were studied,a machine learning based approach HPMM,the BiWV algorithm and the BiWV-P algorithm were proposed.Experimental results on multiple indicators and datasets showed that the proposed method can effectively predict protein functions and provide theoretical guidance for biological experiments.
Keywords/Search Tags:Protein-Protein Interaction Network, Function Prediction, Machine Learning, Weighted Vote, Biological Pathway
PDF Full Text Request
Related items