Font Size: a A A

A machine learning approach for designing DNA sequence assembly algorithms

Posted on:2004-04-10Degree:Ph.DType:Dissertation
University:Rensselaer Polytechnic InstituteCandidate:Lim, Darren TroyFull Text:PDF
GTID:1468390011474783Subject:Computer Science
Abstract/Summary:
We present two separate algorithms for solving the DNA sequence assembly problem. The sequence assembly problem is the reconstruction of a large sequence of DNA from a set of subsequences called fragments. Fragments are created by breaking, at random intervals, copies of the original DNA sequence. This creates a system of fragments in which many of the fragments overlap with each other. Identifying these overlapping fragments is the key to reforming the original strand.; The first algorithm first identifies a “correct” series of fragment merges which would result in producing the original sample from which they were obtained. It enters each series into a database of solutions, which is then used to sequence DNA different than those used to create the database.; The second algorithm uses a k-mer based approach to identifying overlapping regions in fragments. The method is an improvement over the first algorithm in two ways: (1) it is designed to sequence real fragments, which are different in composition from simulated fragments; (2) it can be used to sequence much longer strands of DNA.; For both algorithms, parameters of computation are learned through experimentation with sequences of previously assembled DNA. Our experiments show that the parameters of computation generated by learning on a set of DNAs can be used to successfully sequence a separate set of DNA sequences.
Keywords/Search Tags:DNA sequence, Algorithms, Fragments
Related items