Statistical methods for serial analysis of gene expression | | Posted on:2004-04-29 | Degree:Ph.D | Type:Thesis | | University:The Johns Hopkins University | Candidate:Blades, Natalie Jean | Full Text:PDF | | GTID:2463390011459027 | Subject:Biology | | Abstract/Summary: | PDF Full Text Request | | Serial analysis of gene expression (SAGE) is a technique for obtaining information about gene expression. SAGE experiments provide insight into human disease by identifying disease-related genes and by suggesting possible therapeutic targets. Data from a SAGE experiment consist of long lists of gene identifiers (tags) and corresponding frequencies. A dominant proportion of these lists consist of tags which appear only a few times. Some of the low frequency tags represent low frequency mRNAs, but some are the result of sequencing errors. It is difficult to distinguish between these two cases. This thesis presents methods for enhancing the signal from infrequently occurring tags.; The frequency distributions of tags display a remarkable regularity across cell types and species. The first technique exploits this regularity to automatically discount low counts that cannot reliably be used for comparison of expression levels across conditions for a specific gene and to transform the cell counts to a scale that produces more reliable correlation and clustering of genome-wide expression profiles.; The second contribution is a method for calculation of the error rate in any library. We observe a linear relationship between the copy number for a given tag and the number of tags observed that differ from the tag of interest by a single-base substitution, insertion, or deletion. We have found that the slope of this relationship may be transformed to give an estimate of the error rate.; Finally, we identify the likely erroneously generated tags. We develop a model for reassigning these erroneously read tags by identifying probable errors and the corresponding tags that spawned them. An error in one base pair of a very common tag may result in the observation of a completely new, but similar, tag—a shadow of the common tag. Infrequently observed tags that are very similar to other observed tags may have been created by such a process. On the other hand, infrequently observed tags that are not similar to any other observed tags may represent genuinely infrequently expressed transcripts. The proposed method reassigns the erroneously observed shadows to the tags that may have generated the shadow. | | Keywords/Search Tags: | Gene, Tags, Expression, Observed, SAGE, Infrequently | PDF Full Text Request | Related items |
| |
|