Machine learning methods for the discovery of regulatory elements in bacteria

Posted on:2006-06-02

Degree:Ph.D

Type:Thesis

University:The University of Wisconsin - Madison

Candidate:Bockhorst, Joseph

Full Text:PDF

GTID:2458390008470216

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Technological advances are increasing both the volume and kinds of biological data being generated. These data sets hold great promise for exciting advances in biology and medicine. Because of their sizes, though, manual analysis is often not practical, and novel computational approaches are needed. This thesis investigates the use of machine learning methods for discovering an important class of DNA sequences, known as regulatory elements, that are encoded in the genomes of bacteria.; One set of contributions of this thesis are those related to computational biology. We develop probabilistic models of three types of regulatory elements (promoters, terminators and operons). Key properties of our approach are that it combines heterogeneous evidence sources, predicts all three types of regulatory elements in a single model, and predicts regulatory elements in a set of bacterial genomes simultaneously. We present experiments that show our promoter, terminator and operon predictions all exceed the previous state of the art in terms of accuracy.; Another set of contributions are those related to machine learning. Two of these contributions are novel methods for learning the parameters and structure of a probabilistic grammar. Our empirical evaluation shows that both approaches lead to improved accuracy on a terminator prediction task. Another machine learning contribution of this thesis is a semi-supervised approach to learning from "weakly-labeled" training examples. We show how to acquire and use weakly-labeled examples by exploiting relationships among concepts. Our empirical evaluation shows that these examples can increase accuracy for some training set sizes. A final machine learning contribution of this thesis is a probabilistic framework for representing and predicting overlapping elements in sequence data. Unlike hidden Markov models, which assign labels to individual positions of a sequence, our approach assigns labels to whole subsequences. Experiments designed to test the accuracy of our method show that our approach is more accurate than two alternatives. While each of these machine learning contributions are motivated by properties of the regulatory element discovery problem, they are general and apply to other domains as well.

Keywords/Search Tags:

Machine learning, Regulatory, Methods

PDF Full Text Request

Related items

1	Machine Learning for Exploring State Space Structure in Genetic Regulatory Network
2	Investigation And Suggestions On Regulatory Problems Of MCN Company In Nanjing
3	Research On Methods Of Inferring Gene Regulatory Network Based On Gene Expression Profile
4	Attack And Defense Based On Machine Learning Explainability
5	Scalable Sparse Machine Learning Methods for Big Dat
6	Application of Machine Learning and Statistical Learning Methods for Prediction In A Large-Scale Vegetation Ma
7	Constructing Bank 1104 Regulatory Reporting System Based On Regulatory Report
8	Optimization methods in machine learning: Theory and applications
9	Research And Application On Machine Learning Methods For Health Assessment
10	Novel uses for machine learning and other computational methods for the design and interpretation of genetic microarrays