Font Size: a A A

Kernel Methods For Biological Sequence Analysis

Posted on:2010-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:W Y YangFull Text:PDF
GTID:2120360275970258Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Protein sequences can be classified into functional,structural,and subcellular location groups.One of the most important problems in computational biology is how to automate this classification procedure.Our approach of kernel methods can be categorized into to two aspects,firstly constructing new string kernels and secondly developing new kernel-based learning algorithms.First,we introduce a framework to model protein sequence similarities in the context of kernel methods.This allows a flexible way to constructing various kernels for use with the support vector machines(SVMs).Apart from the existing methods that explicitly construct feature maps from protein sequence to vector space,our framework directly builds the kernel functions by local kernel construction and kernel combination.The proposed framework provides biologically sound kernels from selecting discriminative k-length amino acid subsequences and taking into account of the mismatch,BLOSUM62 score,InterPro entry,and Gene Ontology.We report experiments on two data sets,predicting subcellular localizations and remote homology detections.Experimental results show that the constructed kernels used with an SVM classifier perform better than the existing sequence-based methods.When incorporating with prior knowledge from InterPro and Gene Ontology,our method performs competitively with existing methods using prior knowledge.Second,we explore the interdependences between subcellular locations and incorporates them with SVMs for prediction of protein subcellular localization.Traditional prediction systems utilize a "flat" structure of classifiers,such as the one-versus-all and oneversus-one schemes,with amino acid compositions to perform the prediction.Apart from those existing studies that ignore the interdependences between subcellular locations,we take advantage of a hierarchical structure to organize the subcellular locations and model their relationships.Here,we propose to use four kinds of hierarchical prediction methods and make comparative studies on three data sets.Experimental results show that three of the hierarchical models outperform the traditional "flat" model in terms of tree loss values.In particular,one hierarchical model outperforms the traditional "flat" model for all evaluation measures.
Keywords/Search Tags:string kernels, structured prediction, Support Vector Machines, protein sequence analysis, subcellular localization
PDF Full Text Request
Related items