Font Size: a A A

Semi-parametric simulation of AffyMetrix microarrays to obtain realistic output

Posted on:2011-05-02Degree:Ph.DType:Dissertation
University:Southern Methodist UniversityCandidate:Hardin, AndrewFull Text:PDF
GTID:1448390002954634Subject:Biology
Abstract/Summary:
AffyMetrix Gene Expression Microarrays are used by biologists to detect the presence of mRNA in a biological sample of interest to a researcher. The presence of a specific strand of mRNA implies the presence of a corresponding protein. Microarrays are capable of detecting the presence of mRNA for all the genes in a species genome.;Detecting mRNA requires the use of a fluorescent scanner which digitally records an intensity proportional to the amount of light emitted from each section of the array. However, the recorded intensity does not exactly match the actual mRNA levels and researchers are interested in detecting differences in mRNA levels between distinct biological samples, such as between cancer cell and normal cells. These differences are called differentially expressed genes and statistical methods are used to detect these differences.;Statistical methods vary in effectiveness. Statisticians and bioinformaticians are interested in creating and improving methods. The process requires some way of determining if a given method is actually detecting differential expression accurately. One way of comparing methods is to use each method on specially designed arrays that have known levels of differential expression. Each method can then be compared to determine which methods are performing the best.;In practice it is difficult to construct arrays with known levels of expression. Only a very few are available for researchers. Researchers can use other methods for detecting differential expression on a gene-by-gene basis, but these methods are very time consuming, particularly if one must check hundreds of putative differentially expressed genes. Statistical simulation, a computational approach for constructing an array with characteristic nearly identical of a laboratory-constructed array, offers a cost efficient way of constructing arrays where both the location and the level of expression for each gene can be specified.;To produce a simulation of this type it is necessary to explore and recreate the real properties present in real experiments. AffyMetrix arrays are quite complex to simulate. Individual genes are broken into smaller sections (called probes). Each probe is matched with a nearly identical strand of DNA that varies by only a single base pair, producing a pairing of Perfect Match (PM) and Mismatch (MM). A collection of 11--20 probes interrogates each gene, and this collection is called a probe set. Each probe within a probe set behaves differently; therefore the simulation model must be flexible enough to capture this variability.;Analysis of a large set of AffyMetrix experiments was used to determine the simulation model. This simulation model uses a semiparametric location/scale model based upon nonparametric bootstrapping real experimental data. The resulting simulated arrays reproduce the key characteristics of real data. An example of a simulated set of arrays is shown and discussed. The simulated data are then analyzed using common analysis methods to illustrate how simulated arrays can be used by researchers in practice.
Keywords/Search Tags:Arrays, Simulation, Affymetrix, Methods, Used, Mrna, Expression, Real
Related items