Font Size: a A A

The Design And Analysis Of Clustering Algorithms On Gene Expression Data

Posted on:2009-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z B JiangFull Text:PDF
GTID:2178360272986740Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during biological processes. Elucidating the patterns in genes offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of interpreting genes. A first step toward addressing this challenge is the use of clustering, which is an essential process to reveal natural structures and identify interesting patterns in the underlying data. The work of this paper is to study clustering algorithms applied on gene expression data.In the first part, this paper proposes a clustering algorithm based on minimum spanning trees (MST), called a MST-based Uncertain Partition clustering algorithm, MUP. It can fast and effectively finish clustering analysis on gene expression data. The features of MUP algorithm are: First, its difference with other MST-based clustering algorithms is the method to determine inconsistent edges. It takes two steps: sliding a window to search potential inconsistent edges and determining real inconsistent edges by using the objective function. Second, without any biological information, it could determine the number of clusters automatically. Last, in the noisy background, it could still find interesting patterns. Applied on two real gene expression datasets, Wen's and Iyer's, MUP algorithm gets good clustering results on both. This proves that MUP algorithm is effective to large scale gene expression data.In the second part, this paper studies the multi-view clustering in orthogonal subspaces applied to analyze gene expression data, which has been successfully applied on other fields, such as text clustering and images clustering. Theoretically it could analyze gene expression data, so this paper studies it by using real gene expression dataset, Cho's. After experimental analysis this paper concludes that the multi-view clustering in orthogonal subspaces could group genes from multi-profile.This paper concludes that both of MUP algorithm and clustering in orthogonal subspaces could effectively find interesting patterns from gene expression data and be strongly helpful for studying genes and the following work.
Keywords/Search Tags:Gene expression data, Clustering algorithms, Minimum spanning trees, MUP clustering algorithm Clustering in orthogonal subspaces
PDF Full Text Request
Related items