Pattern discovery in biological data sets

Posted on:2008-12-22

Degree:Ph.D

Type:Thesis

University:University of Pennsylvania

Candidate:Angelov, Stanislav Plamenov

Full Text:PDF

GTID:2448390005970588

Subject:Biology

Abstract/Summary:

In recent years, we have seen a rapid increase in the available DNA and protein data coming from various genome sequencing projects. Such data is carefully studied for features reused by nature in order to understand the mechanisms of life. Many of these features are expressed as sequence patterns. Therefore, efficient computational methods to discover biologically significant motifs are highly desirable as they provide researchers with new insights into biological processes, causes of diseases, and evolution of life.; There are two main approaches for extracting knowledge from sequence data. One approach compares newly acquired data with possibly, already annotated data under the assumption that data similarity implies functional similarity. The second approach mines the data for frequently occurring or surprising patterns. Such patterns are unlikely to occur at random and pinpoint candidates for further laboratory investigations.; In this thesis, we follow the above approaches to extract useful information from biological data sets such as DNA and protein sequences, as well as microarray-based gene expression profiles. Our contributions include linear time and near-linear time algorithms to enumerate short DNA substrings that contain evolutionary history, efficient algorithms for design of composite patterns with application to PCR, and new techniques for automated protein domain discovery using correlation clustering. We also give fast exact and approximation methods for nonparametric analysis of gene expression data using isotonic regression. In addition to these theoretical results, we implement our methods and analyze the findings on real, biological data.

Keywords/Search Tags:

Data, DNA

Related items

1	Seismic Achievement Data ETL Platform Architecture Design And Software System Implementation
2	The Research And Application Of Data Preprocessing In XML Data Warehouse
3	Research On Related Issues Of Unstructured Data
4	The Data Integration、analysis And Utilization For Hosiptal Information Based On The Data Warehouse
5	Design And Implementation Of Data Mining Support Subsystem Based On Big Data Of Power
6	Design And Implementation Of Environmental Monitoring Data Management System
7	Research On The Problems And Countermeasures Of Domestic Data Journalism Practice
8	Study On Data Dependency_Based Data Quality Processing Techniques In Data Integration
9	Big Data And Research Of Big Data In Modern Internet Applications
10	Design And Implementation Of The Bayonet Data Integration Platform