Identification of protein coding regions in microbial genomes using unsupervised clustering

Posted on:2010-01-21

Degree:M.S

Type:Thesis

University:University of Nevada, Reno

Candidate:Konda, Jayashree

Full Text:PDF

GTID:2448390002483707

Subject:Biology

Abstract/Summary:

At present the genomes of many organisms have been sequenced, meaning that their nucleotide structure is known but the location of genes, and most importantly, the coding regions, are unknown. Identifying coding regions is of vital importance, as they code for proteins. Distinguishing between coding and non coding regions is a difficult undertaking and many research efforts have been studied. We describe here an unsupervised clustering algorithm to find out protein coding regions in microbial genomic DNA sequences. The algorithm is based on a simple measure called vector of frequencies of nucleotides in sliding window and uses an ab-initio iterative Markov modeling procedure to partition the genomic sequences into coding, coding on the opposite strand and non-coding regions. The algorithm is very efficient and it can be used for any type of microbial genomes and also for uncharacterized microorganisms. Based on a method developed by Audic and Claverie [18], we improved the accuracy of finding coding regions and also found the nearest transition point from one class to another with an accuracy matching and exceeding the level of the best currently used gene detection methods. The method was examined on 18 complete microbial genomes from Genbank which covers four classes of major phylogenic lineages (Gram negative, Gram positive, cyanobacteria, and archaea). The results showed an improvement in performance of predicting coding regions of microbial genomes.

Keywords/Search Tags:

Coding regions, Genomes

Related items

1	Sorting Signed Genomes By Reversals And Transpositions
2	Design Of Optimization Algorithm For DNA Coding Sequence Detection
3	Video Coding And Unequal Error Protection Based On Regions Of Interest
4	Research Of Image Compression Coding Based On Regions Of Interest(ROI)
5	Research On Automatic Gene Structure Annotation System For Eukaryotic Genomes
6	Elucidating the Role of Alternative RNA Export Promoting Signal Sequence Coding Regions in Potentiating Translation
7	Three-dimensional wavelet coding of video with arbitrary regions of support
8	Region-based subband coding of image sequences
9	Research On The Key Technology Of Salient Regions Extraction And Matching Of Terra-cotta Warrior’s Fragments
10	The Optimized Stereo Matching Algorithm Of Occlusion Regions And Low-texture Regions Based On Belief Propagation