Font Size: a A A

Analysis of microarray data

Posted on:2006-05-17Degree:Ph.DType:Thesis
University:Yale UniversityCandidate:Duan, FenghaiFull Text:PDF
GTID:2450390005495570Subject:Biology
Abstract/Summary:
DNA microarray data analysis has been an active statistical topic in recent years due to its vast applications in biomedical fields and complicated data structure. In this thesis, I discuss three different levels of microarray data analysis including normalizing the Affymetrix GeneChipsRTM, identifying significantly differentially expressed (SDE) genes and clustering the similar expressed genes into groups.; In the first section, I am particularly interested in resolving a special spatial effect on the images of certain Affymetrix GeneChipsRTM, which I call texture effect. I further show that the common normalization methods fail to correct the texture effect that in turn affects the identification of differentially expressed genes. To resolve this problem, I explore a way to assess and correct the texture effect by modeling the correlation structure and the periodicity property of the texture effect.; In the second section, I compare the performance of several approaches for identifying differentially expressed genes for the probe-level data of Affymetrix GeneChipRTM. I focus on the comparison between the summarization methods and non summarization methods. For the summarization methods, I first present a theoretical result that reveals the fact that the difference as a result of using MAS 5 (a single-chip approach) versus RMA (a multi-chip approach) actually comes from the mismatch incorporation and the application of different robust algorithms, instead of their "single-chip" and "multi-chip" properties. For the non-summarization methods, I compare the performance between fixed probe-effect modeling and random probe-effect modeling (RPM) in the identification of SDE genes. I show that the fixed probe-effect modeling, together with the summarization methods (MAS 5 and RMA), tend to be over-optimistic in estimating the variances during the identification of the SDE genes. At the same time, random probe-effect modeling performs much better than other methods with respect to the coverage probability from the simulation studies. The Affymetrix Spikein dataset and a mouse data are used to demonstrate the advantage of the random probe-effect modeling.; In the last section, I first show that for the popularly-cited Spellman et al's (1998) yeast cell cycle data, the standard clusterings are deficient due to the existence of the loss of synchrony. I then propose a method to improve the performance of the k-means methods by assigning a decreasing weight on its variable level and evaluating the "weighted k-means" on a simulated dataset and Spellman et al's (1998) yeast cell cycle data. The protein complexes in a public website are used as biological benchmarks. Results show that an exponential decreasing weight function assigned to the variable level of k-means generally increases the agreement between protein complex and k-means clusters.
Keywords/Search Tags:Data, Microarray, Probe-effect modeling, Texture effect, Summarization methods, K-means
Related items