Font Size: a A A

Research Of Identifying Proteins Noisy Functional Annotations

Posted on:2018-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:C LuFull Text:PDF
GTID:2310330536473569Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Proteins are the carriers of the most important biological activities,and perform a variety of important functions in an organism.Automatically annotating functions of proteins is one of the key tasks in bioinformatics and the post-genomic area.Annotating protein function correctly is significant to the research fields of analysis and regulation of disease mechanism,new drug research and development,promotion of crop production,bioenergy development and so on.Functional annotations of proteins are collected from multiple sources and thus noisy annotations are inevitably introduced.These noisy annotations will mislead the analysis and application of protein related functions,and reduce the prediction accuracy of protein function.However,current research of protein function prediction always focuses on predicting functions for completely unannotated proteins or replenishing missing annotations of proteins,and seldom pays attention to identifying noisy functions of proteins.In summary,the key contributions of the thesis are shown below:(1)We propose a noisy function annotations identification method using taxonomic and semantic similarity(NoisyGOA).NoisyGOA firstly measures taxonomic similarity between ontological terms by using the GO hierarchy and semantic similarity between proteins.Then,it aggregates the maximal taxonomic similarity between terms annotated to a protein and terms annotated to neighborhood proteins.After that,it takes terms with the smallest aggregated scores as noisy annotations of the protein.We compare NoisyGOA with other alternative methods on identifying noisy annotations under different simulated cases of noisy annotations,and on archived GO annotations.Experiments on GOA files of S.cerevisiae,H.Sapiens and A.thaliana show that NoisyGOA achieved higher accuracy than other alternative methods in comparison.These results demonstrate both taxonomic similarity and semantic similarity contribute to the identification of noisy annotations.(2)As NoisyGOA still suffers from noisy annotations in measuring the semantic similarity between proteins,and does not differentiate the reliability of different annotations,we propose another noisy annotation identification method called NoGOA.NoGOA applies sparse representation on the protein-term association matrix to reduce the impact of noisy annotations,and takes advantage of sparse representation coefficients to measure the semantic similarity between proteins.Secondly,it preliminarily identifies noisy annotations of a protein based on aggregated votes from semantic neighborhood proteins of that protein.Next,NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different months,and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy.Finally,it integrates evidence-weighted association matrix and aggregated votes to identity noisy annotations.Experiments on archived GOA files of S.cerevisiae,H.Sapiens and A.thaliana demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of protein function prediction.
Keywords/Search Tags:protein function, noisy annotation, sparse representation, semantic similarity, taxonomic similarity
PDF Full Text Request
Related items