| With the extensive applications of DNA microarray technology, huge amounts of gene expression data have been generated. How to analyze and handle these data, digging out valuable biological and medical knowledge, has become a bottleneck and hotspot in the research of post-genomic age. Cluster analysis is a major exploratory technique to group genes with related functions according to the similarities in their expression profiles, helpful to understand gene function, gene regulation, cellular processes, and subtypes of cells. Aiming at the specific problems in cluster analysis of gene expression data, that is, selection of clustering algorithms and parameters, assessment of the clustering results, and predicting the number of clusters, the following innovative work has been carried out.1. The performances of several popular clustering algorithms for gene expression data, hierarchical clustering, K-means clustering and self-organizing maps, as well as their selections of similarity metric and data transformation, were studied with gene expression datasets of external criteria. The final results illustrated that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and datasets normalized by line, while K-means clustering and SOMs can produce better clusters with Euclidean distance and normalized logarithm transformed datasets. Besides, K-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, while single-linkage hierarchical clustering is not recommended.2. The validation ability of Silhouette index, Dunn's index, Davies-Bouldin index and FOM for gene clustering results was investigated. It was made clear that Silhouette index and FOM can preferably reflect the performance of clustering algorithms and the quality of clustering results, Dunn's index should not be used directly in gene clustering validation for its high susceptibility to outliers, while Davies-Bouldin index can afford better validation than Dunn's index, exception for its preference to single-linkage hierarchical clustering.3. The ability of Silhouette index and Davies-Bouldin index to estimate the number of clusters in given dataset were studied. As a result, the prediction success rate of Silhouette index and Davies-Bouldin index is too low to be accepted, while the sharp knee of FOM can afford rough suggestion with uncertainty and subjectivity. Relative Silhouette index and relative Davies-Bouldin index were developed, expanding the estimate ability of Silhouette index and Davies-Bouldin index. The expert function for... |