Font Size: a A A

Research Of Predicting Protein Function Based On Gene Ontology Dimensionality Reduction

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhaoFull Text:PDF
GTID:2370330599956776Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Proteins are the most important functional carriers in the life activities of living cell,and carry out a variety of important functions in an organism.Automatically annotating functions of proteins is one of the key tasks in bioinformatics and the post-genomic area.Accurately and comprehensively annotating protein function not only helps people understand the life mechanism correctly,but also greatly promote research fields such as disease analysis,drug research,promotion of crop production and so on.Gene Ontology(GO)is a functional annotation database that is widely applied in protein function prediction.More than 45,000 functional label terms are included in GO,but each protein only is annotated with several or dozens of these terms which own complex structure,posing a challenge to protein function prediction.This thesis combines with gene ontology modeling and dimensinality reduction learning to study protein function prediction.In summary,the key contributions of the thesis are displayed as follows:(1)We proposed a protein function prediction based on hashing gene ontology(HashGO).HashGO firstly utilizes the ontology graph structure to define the taxonomic similarity between GO terms.Then,it tailors a graph hashing method to optimize a series of hash functions and to encode massive GO terms via compact binary codes.After that,it uses these hashing functions to compress the high-dimensional protein-term association matrix into a low-dimensional one.On that low-dimensional matrix,HashGO computes the semantic similarity between proteins based on Hamming distance.Finally,it predicts missing annotations of a protein based on the annotations of its semantic neighbors.Experimental results on archived GO annotations of Yeast and Human show that the effectiveness and superiority of this method in replenishing missing annotations of proteins.HashGO not only more accurately predicts functions than other related approaches,but also runs faster than them.(2)HashGO may not well respect the hierarchical relationship of GO in the process of hashing ontology graph,we further proposed another protein function prediction based on gene ontology hierarchy preserving hashing(HPHash).HPHash firstly measures the hierarchical order relationships between GO terms.It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms,and to optimize a series of hashing functions to encode massive GO terms via compact binary codes.After that,HPHash utilizes these hashing functions to project the protein-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space.Experimental results on three model species(Human,Mouse and Rat)show that HPHash performs better than other related approaches and it is robust to the number of hash functions.(3)The above methods lack interpretability of compressed labels and suffer from the inherent problem of thresholding labels in multi-label learning.To solve these problems,we introduced another method called protein function prediction based on zero-one matrix factorization(ZOMF).ZOMF firstly factorizes the protein-term association matrix into two low-rank zero-one matrices,and explores the inner latent relationship between proteins and terms.Next,it defines two smoothness terms on these two low-rank matrices with respect to protein-protein interactions and the structural relationships between terms,and thus to guide the optimization of low-rank matrices.Finally,it reconstructs the association matrix using the optimized two low-rank matrices to predict protein function.Experimental results on four model species(Yeast,Arabidopsis,Mouse and Human)show that ZOMF can more accurately predict protein function annotations than existing algorithms,it does not need to threshold the reconstructed matrix,and the compressed zero-one labels have more intuitive explanation.
Keywords/Search Tags:Protein function prediction, Gene ontology, Graph hashing, Hierarchy preserving hashing, Matrix factorization
PDF Full Text Request
Related items