Learning with multiple kernels: Semidefinite programming, duality, efficient optimization and applications in computational biology

Posted on:2006-01-23

Degree:Ph.D

Type:Thesis

University:University of California, Berkeley

Candidate:Lanckriet, Gert Rene Georges

Full Text:PDF

GTID:2450390008955668

Subject:Engineering

Abstract/Summary:

An important challenge for the field of machine learning is to leverage the diversity of information available in large-scale learning problems, in which different sources of information often capture different aspects of the data. Beyond classical vectorial data formats, information in the format of graphs, trees, strings and beyond have become widely available. For example, in computational biology many such sources of information about genes and proteins are now available: sequence, expression, protein and regulation information. More data types are going to be available in the near future, such as array-based fitness profiles and protein-protein interaction data from mass spectrometry.;Recent work in computational biology (such as gene function prediction; prediction of protein structure and localization, and inference of regulatory and metabolic networks) could benefit significantly from an approach that treats in a unified way the different types of information, merging them into a single representation, rather than only using the description that is judged to be the most relevant at hand.;In this thesis, a principled computational and statistical framework to integrate data from heterogeneous information sources in a flexible and unified way is introduced. The approach is formulated within the unifying learning framework of kernel methods and applied to the specific case of classification. Each data set is represented via a kernel function, which defines a generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and provides a principled framework in which many types of data can be represented, including vectors, strings, trees and graphs.;The resulting formulation takes the form of a semidefinite programming (SDP) problem. Although this implies a polynomial time algorithm; the scale of many real-life problems is often beyond the reach of general-purpose SDP algorithms. Using tools from conic duality and convex analysis, a dedicated algorithm is derived that is significantly more efficient than generic SDP methods in this setting.;Finally, applications to computational biology are presented, showing that classification performance can be enhanced by integrating diverse genome-wide information sources.

Keywords/Search Tags:

Computational biology, Information, Efficient, Kernel, Available, Sources

Related items

1	Pure sources and efficient detectors for optical quantum information processin
2	Efficient computational techniques for high dimensional stochastic modeling
3	Some Studies On Optimization Topics Of Computational Biology
4	New methods in computational systems biology
5	The Coupling Model Of Ethanol Metabolism Between Arabidopsis Pollen Tube And Stigma Based On The Computational Systems Biology
6	Obtaining Biology Information With Digital Image Processing And Computational Electromagnetics
7	The Designation And Implementation Of Biological Data Platform For Computational Researches
8	System Analysis Of Biology Computational Resources
9	An analytic approach to tensor scale with efficient computational solution and applications to medical imaging
10	On Efficient Computational Algorithm For Optimal Designs