Clustering of mixed data types with application to toxicogenomics

Posted on:2006-06-20

Degree:Ph.D

Type:Dissertation

University:North Carolina State University

Candidate:Bushel, Pierre Robert

Full Text:PDF

GTID:1458390008472671

Subject:Biology

Abstract/Summary:

DNA microarray analysis provides unprecedented capabilities for simultaneous measurement of genome-wide alterations in transcription levels. Toxicogenomics bridges gene and protein expression analyses with conventional toxicology to elucidate a global view of the toxic outcomes and mechanistic changes elicited by toxicant exposure and environmental stressors to biological systems. Inherent in toxicogenomics data are systematic error, stochastic variation and disparate measurement domains and types which complicate the acquisition of significant, meaningful and broad biological interpretations from analysis of the data. In this dissertation, a classification regimen comprised of analysis of replicate data, outlier diagnostics and gene selection procedures was employed to utilize microarray data for categorization of sub-classes of biological samples exposed to pharmacologic agents. To assess contrasts of centrilobular congestion severity of the rat liver subsequent to exposure with acetaminophen (APAP), microarray data, clinical chemistry evaluations and histopathology observations were integrated in a database and analyzed using mixed linear model approaches. Finally, the k-prototype algorithm with a mixed objective function comprised of the sum of the squared Euclidean distance to measure the dissimilarity of samples based on microarray array and clinical chemistry numeric data features and simple matching to measure the dissimilarity of the samples based on histopathology features with categorical values, was modified (Modk-prototypes) to the specifications of k-means clustering. In addition, the objective function included weighting terms for the microarray, clinical chemistry and histopathology domain data in order to computationally integrate the data as well as constrain the clustering of the APAP-treated samples according to similarity of gene expression and toxicological profiles. Simulated annealing optimization of the Modk (SA-Modk)-prototypes algorithm was used to validate the clustering of the APAP-treated samples. The clusters were vetted for gene expression and toxicological (VETed) k-prototypes features that discerned clusters from one another. The VETed k-prototypes are shown to be ideal for distinguishing between zero, minimal, and moderate levels of necrosis of the hepatocytes and centrilobular region of the rat liver that are end-point representations of the clusters of APAP-treated samples. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Data, Apap-treated samples, Clustering, Microarray, Mixed, Gene

Related items

1	The Research Of Gene Selection And Clustering Method In Gene Microarray Data Analysis
2	Research On Relevant Problems Of DNA Microarray Expression Data Analysis
3	Microarray Data Clustering Algorithm
4	Clustering algorithms for time series gene expression in microarray data
5	Comparison of clustering algorithms for gene expression microarray data
6	Association Rules Mining And Its Applications In Microarray Gene Expression Data
7	K-means clustering with automatic determination of K using a Multiobjective Genetic Algorithm with applications to microarray gene expression data
8	Enrichment constrained time dependent clustering analysis of time series microarray data
9	Quantification of gene expressions from microarray images using fuzzy clustering
10	Gaussian Mixture Model-based Clustering Analysis For Gene Microarray Expression Data