Research On Algorithms And Application For Cluster Analysis Of Gene Expression Data

Posted on:2007-11-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C M Yang

Full Text:PDF

GTID:1104360212470876

Subject:Biomedical engineering

Abstract/Summary:

With the extensive applications of DNA microarray technology, huge amounts of gene expression data have been generated. How to analyze and handle these data, digging out valuable biological and medical knowledge, has become a bottleneck and hotspot in the research of post-genomic age. Cluster analysis is a major exploratory technique to group genes with related functions according to the similarities in their expression profiles, helpful to understand gene function, gene regulation, cellular processes, and subtypes of cells. Aiming at the specific problems in cluster analysis of gene expression data, that is, selection of clustering algorithms and parameters, assessment of the clustering results, and predicting the number of clusters, the following innovative work has been carried out.1. The performances of several popular clustering algorithms for gene expression data, hierarchical clustering, K-means clustering and self-organizing maps, as well as their selections of similarity metric and data transformation, were studied with gene expression datasets of external criteria. The final results illustrated that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and datasets normalized by line, while K-means clustering and SOMs can produce better clusters with Euclidean distance and normalized logarithm transformed datasets. Besides, K-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, while single-linkage hierarchical clustering is not recommended.2. The validation ability of Silhouette index, Dunn's index, Davies-Bouldin index and FOM for gene clustering results was investigated. It was made clear that Silhouette index and FOM can preferably reflect the performance of clustering algorithms and the quality of clustering results, Dunn's index should not be used directly in gene clustering validation for its high susceptibility to outliers, while Davies-Bouldin index can afford better validation than Dunn's index, exception for its preference to single-linkage hierarchical clustering.3. The ability of Silhouette index and Davies-Bouldin index to estimate the number of clusters in given dataset were studied. As a result, the prediction success rate of Silhouette index and Davies-Bouldin index is too low to be accepted, while the sharp knee of FOM can afford rough suggestion with uncertainty and subjectivity. Relative Silhouette index and relative Davies-Bouldin index were developed, expanding the estimate ability of Silhouette index and Davies-Bouldin index. The expert function for...

Keywords/Search Tags:

gene expression, cluster analysis, cluster validation, hierarchical clustering, K-means clustering, self-organizing maps

Related items

1	Research Of Clustering Strategies For Dynamic Electrocardiogram Waveform
2	Detection Of Arterial Input Function From Cerebral Perfusion Using DSC-MRI Based On Clustering Analysis
3	A Study On The Cluster Analysis For Parkinson-relates Genes
4	Research On Weighted Clustering Algorithm Based On Tumor Gene Expression Data
5	Clustering Human Wrist Pulses For Traditional Chinese Medicine
6	Gene Expression Clustering Analysis Method
7	Relative Technology And Realization Of DNA Microarray Lmages Recognition
8	Data Clustering And Medical Image Analysis Based On Subspace Analysis
9	Morphology And Clustering Analysis Of Adults Maxillary Tuberosity Based On CBCT
10	Construction Of Prediction Model Of Liver Fibrosis In Hepatitis B Patients Based On Similarity Feature Clustering