Font Size: a A A

Methods for cluster analysis and validation in microarray gene expression data

Posted on:2007-07-04Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Kosorukoff, Alexander LvovichFull Text:PDF
GTID:1450390005981841Subject:Biology
Abstract/Summary:
Motivation. Unsupervised learning or clustering is frequently used to explore gene expression profiles for insight into both regulation and function. However, the quality of clustering results is often difficult to assess and each algorithm has tunable parameters with often no obvious way to choose appropriate values. Most algorithms also require the number of clusters to be predetermined yet this value is rarely known and, thus, is arrived at by subjective criteria. Here we present a method to systematically address these challenges using statistical evaluation.; Method. The method presented compares the quality of clustering results in order to choose the most appropriate algorithm, distance metric and number of clusters for gene network discovery using objective criteria. In brief, two quality assessment metrics are used: the Consensus Share (CS) and the Feature Configuration Statistic (FCS). CS is the percentage of genes (not gene pairs) that are identically clustered in several clusterings and FCS is a measure of randomness of the observed configuration of transcription factor binding sites among clustered genes.; Results. We evaluate this method using both artificial and yeast microarray data. By choosing parameters settings that minimize FCS values and maximize CS values we show major advantages over other clustering methods in particular for identifying combinatorially regulated groups of genes. The results produced provide remarkable enrichment for cis-regulatory elements in clusters of genes known to be regulated by such elements and evidence of extensive combinatorial regulation. Moreover, the method can be generalized when prior information about cis-regulatory sites is absent or it is desirable to calculate FCS values based on functional categorization.
Keywords/Search Tags:Gene, FCS, Method, Clustering, Values
Related items