Font Size: a A A

Clustering of mixed data types with application to toxicogenomics

Posted on:2006-06-20Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Bushel, Pierre RobertFull Text:PDF
GTID:1458390008472671Subject:Biology
Abstract/Summary:
DNA microarray analysis provides unprecedented capabilities for simultaneous measurement of genome-wide alterations in transcription levels. Toxicogenomics bridges gene and protein expression analyses with conventional toxicology to elucidate a global view of the toxic outcomes and mechanistic changes elicited by toxicant exposure and environmental stressors to biological systems. Inherent in toxicogenomics data are systematic error, stochastic variation and disparate measurement domains and types which complicate the acquisition of significant, meaningful and broad biological interpretations from analysis of the data. In this dissertation, a classification regimen comprised of analysis of replicate data, outlier diagnostics and gene selection procedures was employed to utilize microarray data for categorization of sub-classes of biological samples exposed to pharmacologic agents. To assess contrasts of centrilobular congestion severity of the rat liver subsequent to exposure with acetaminophen (APAP), microarray data, clinical chemistry evaluations and histopathology observations were integrated in a database and analyzed using mixed linear model approaches. Finally, the k-prototype algorithm with a mixed objective function comprised of the sum of the squared Euclidean distance to measure the dissimilarity of samples based on microarray array and clinical chemistry numeric data features and simple matching to measure the dissimilarity of the samples based on histopathology features with categorical values, was modified (Modk-prototypes) to the specifications of k-means clustering. In addition, the objective function included weighting terms for the microarray, clinical chemistry and histopathology domain data in order to computationally integrate the data as well as constrain the clustering of the APAP-treated samples according to similarity of gene expression and toxicological profiles. Simulated annealing optimization of the Modk (SA-Modk)-prototypes algorithm was used to validate the clustering of the APAP-treated samples. The clusters were vetted for gene expression and toxicological (VETed) k-prototypes features that discerned clusters from one another. The VETed k-prototypes are shown to be ideal for distinguishing between zero, minimal, and moderate levels of necrosis of the hepatocytes and centrilobular region of the rat liver that are end-point representations of the clusters of APAP-treated samples. (Abstract shortened by UMI.)...
Keywords/Search Tags:Data, Apap-treated samples, Clustering, Microarray, Mixed, Gene
Related items