Analysis Of Gene Expression Data Clustering

Posted on:2008-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Yi

Full Text:PDF

GTID:2120360272477024

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

. With the development of MicroArray technology, more and more gene expression datasets are being obtained. So, how the useful information can be drawn from the gene expression datasets becomes an important issue in the Bioinformatic research field.Those genes with similar functions usually share similar expression patterns. The unknown genes'function can be forecasted by analyzing genes with similar expression pattern. Clustering algorithm is a Data-mining method which can partition data into clusters according to their similarity, making data of one kind come together. Using clustering algorithm, genes with similar expression can be clustered into the same group. It is helpful for finding the functions of genes and the co-relationships between genes.However, clustering is a subjective process. Different selection of algorithms, cluster numbers or starting seeds would lead to different outcomes. This makes the results of gene expression data clustering more subjective. Now, the key point of gene expression data analysis is that how to use the existing algorithms effectively and make the clustering algorithms more objective. This would improve the accuracy of gene expression data analysis.For all above mentioned problems, we've studied the following work in this thesis:(1) The fact that there exists a great deal of missing values in the gene expression data due to various reasons will affect the accuracy of clustering. General Regression Neural Network was employed in this thesis to estimate the missing value.(2) Different clustering algorithms on gene expression data were studied; Some advanced algorithms were introduced; The relationships between clustering algorithm and data distribution structure were also investigated.(3) Different distribution structure of gene expression data should be clustered by different algorithms. It is difficult for us to obtain the distribution structure of high dimensional gene expression data. In this thesis, the stability of clustering results was taken as an evaluation criteria, and stability-based selection method was proposed for clustering algorithms.(4) Employing the same algorithm for the same dataset, the results of clustering may vary from time to time, because the starting seeds of clustering each time are different. The seeds setting affects the probability of falling into local minima and the times of iteration while clustering. In this thesis, a PCA-based method was proposed for gene expression data clustering seeds setting.

Keywords/Search Tags:

gene expression data, clustering, missing value, algorithm selection, seeds

PDF Full Text Request

Related items

1	A Feature Selection Algorithm For Biological Data Based On Dynamic Iterative Spectral Clustering
2	Research On Weighted Two - Way Clustering Algorithm Based On Gene Expression Microarray Datasets
3	Research On Multi-objective Clustering Method Of Incomplete Gene Expression Data
4	Research On Robust Matrix Factorization Method And Its Application In Gene Expression Data
5	Research On Feature Selection For Gene Expression Data
6	Research On Gene Expression Data Analysis Method And Its Application
7	Research On Hybrid Gene Selection Method Based On Clustering
8	Clustering Based On Genetic Algorithm For Gene Expression Data
9	Research And Implementation Of Biological Feature Selection Algorithm Based On Hierarchical Clustering
10	Research On 2D Spatial Gene Selection Algorithm Based On Unbalanced Gene Data