Statistical methods for serial analysis of gene expression

Posted on:2004-04-29

Degree:Ph.D

Type:Thesis

University:The Johns Hopkins University

Candidate:Blades, Natalie Jean

Full Text:PDF

GTID:2463390011459027

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Serial analysis of gene expression (SAGE) is a technique for obtaining information about gene expression. SAGE experiments provide insight into human disease by identifying disease-related genes and by suggesting possible therapeutic targets. Data from a SAGE experiment consist of long lists of gene identifiers (tags) and corresponding frequencies. A dominant proportion of these lists consist of tags which appear only a few times. Some of the low frequency tags represent low frequency mRNAs, but some are the result of sequencing errors. It is difficult to distinguish between these two cases. This thesis presents methods for enhancing the signal from infrequently occurring tags.; The frequency distributions of tags display a remarkable regularity across cell types and species. The first technique exploits this regularity to automatically discount low counts that cannot reliably be used for comparison of expression levels across conditions for a specific gene and to transform the cell counts to a scale that produces more reliable correlation and clustering of genome-wide expression profiles.; The second contribution is a method for calculation of the error rate in any library. We observe a linear relationship between the copy number for a given tag and the number of tags observed that differ from the tag of interest by a single-base substitution, insertion, or deletion. We have found that the slope of this relationship may be transformed to give an estimate of the error rate.; Finally, we identify the likely erroneously generated tags. We develop a model for reassigning these erroneously read tags by identifying probable errors and the corresponding tags that spawned them. An error in one base pair of a very common tag may result in the observation of a completely new, but similar, tag—a shadow of the common tag. Infrequently observed tags that are very similar to other observed tags may have been created by such a process. On the other hand, infrequently observed tags that are not similar to any other observed tags may represent genuinely infrequently expressed transcripts. The proposed method reassigns the erroneously observed shadows to the tags that may have generated the shadow.

Keywords/Search Tags:

Gene, Tags, Expression, Observed, SAGE, Infrequently

PDF Full Text Request

Related items

1	The SAGE Construction And Analysis Before And After Heat Treated In The5-instar Larvae Of The Male Silkworm, Bombyx Mori
2	Production of transgenic eastern oysters
3	Long-SAGE Library Construction Of Diffrernt Feeding Habit Silkworm Varieties On Artificial Diet And Differential Expression Genes Analysis
4	SAGE Transcript Profiles Of Vero Cells Infected By Infectious Bursal Disease Virus (IBDV), And Transformed With Genes Encoded By IBDV A-Segment
5	Studies On The Construction Of CDNA Expression Library From Adult Fasciola Gigantica And Its Expressed Sequence Tags
6	Genetic diversity and paternity analysis of endangered Canadian greater sage-grouse (Centrocercus urophasianus)
7	Digital Gene Expression Analysis Of Two Temperature Sensitive Mutant In Magnaporthe Oryzae And Related Genes Knockout
8	Cloning And Expression Of Anthocyanin Biosynthesis Related Structure Genes In Vitis Amurensis
9	Study On Gene Expression In Magnaporthe Grisea By Expressed Sequence Tags And CDNA Array Monitoring
10	Analysis Of Expressed Sequence Tags Of Heart Tissues From Two Breeds Of Pigs