Font Size: a A A

Gene Microarray Data Analysis Based On Clustering Algorithms

Posted on:2009-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2178360272456795Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Based on microarray experiment, the expression level of thousands of genes can be simultaneously observed, and the method of the analysis for the gene expression data is hot in bioinformatics. Currently, various data mining methods are used to mine the underlying gene expression modes, which may introduce reasonable interpretations in identifying groups of genes or samples. Cluster analysis is a major exploratory technique to group genes with related functions according to the similarity in their expression profiles, and is helpful to understand gene function, gene regulation, cellular processes, and subtypes of cell.The microarray data have some traits, such as small samples, high dimensionality, non linearity, also. Considering the specialty of gene expression data, this paper mainly analyses it from two aspects of gene and sample, such as gene function research and co-research, cancer classification and sub-classification. It has obtained some good efforts as follows:1. The changes in gene expression may often be associated with changes in gene function. It is believed that clusters of gene expression patterns help to identify co-expressed genes and its regulations. In this paper, a new fuzzy similar relation matrix is constructed and a modified clustering algorithm based on fuzzy similarity relation is proposed. On this base, a new method is used to find the initial center of FCM algorithm. Experimental results show that the proposed method has not only to certain extent overcome the limitation of FCM algorithm, but also identify cell-cycle regulated genes, whose expression levels change periodically during the cell cycle.2. We can discovery some unknown subtypes of disease through samples clustering. Because of a lot of noise was introduced in the experimental process, therefore it is necessary to de-noising before clustering analysis. In this paper , based on wavelet de-nosing, an improved fuzzy C-means algorithm is applied to the gene expression data of Leukaemia. The results show that the algorithm was effective.3. Because gene expression data is high dimensions and small samples, a two-way clustering algorithm based on the representative entropy is proposed. First, the clustering of genes is realized through the SOM network, and characteristic genes are selected according to the fluctuation coefficient .Then the quality of gene clustering is decided by the value of representative entropy. Finally, fuzzy c-means algorithm is employed to classification of samples. The experiment results show that the algorithm can reduce the feature space dimensions and improve the accuracy of clustering.
Keywords/Search Tags:microarrays, gene expression, cluster analysis, fuzzy C-means, self-organizing map, representative entropy
PDF Full Text Request
Related items