The Research On Clustering Algorithm Applied To Gene Expression Data

Posted on:2014-12-30

Degree:Master

Type:Thesis

Country:China

Candidate:X H Wang

Full Text:PDF

GTID:2268330398499494

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Thousands of gene expression data can be produced from experiment ofgene chips recentlyï¼Œwhich contains the rich information that can explain thephenomenon of life, By analysising this gene expression data we can understand thegenetic information how to converted to a functional gene product. Clusteringalgorithm as a kind of important analysis method is widely used to detect thebiological information of gene expression data.The basic principle of clustering algorithm is to divide multiple variablesinto multiple classes according to the similarity measure. The conventional clusteringalgorithm cluster genes or conditions respectively. The conventional clusteringalgorithm is based on the assumption that related genes behave similarity under allthe conditions,which can only capture global information of the gene expression data.Because a lot of local patterns are existed in the high-dimension gene expressiondata, coclustering algorithm has been proposed recently as a powerful computationaltool to detect subsets of genes that exhibit consistent pattern over subsets ofconditions. In spite of much research in this domain, existing co-clustering algorithmshave some critical limitations in terms of identifying coclusters (a cocluster of a geneexpression data is a subset of genes which exhibit similar expression patterns along asubset of conditions)with different types of correlations in the data and the ability tocapture overlapping co-clusters in the data matrix. In this article, we compare andanalysis several coclustering algorithms, then we present a new coclusteringalgorithm that combined with clustering algorithm. We evaluated our algorithm onseveral real-world gene expression datasets, and the experimental results showedthat the proposed algorithms is able to?nd biological signi?cant coclusters and alsooutperformed some of the well-known existing co-clustering algorithms in terms ofthe quality, size and biological signi?cance of the co-clusters.The main innovation of this article include the following respects:(1)basedon ideas from lossy data coding and compression,we present a simple but effectivetechnique for clustering genes, the goal is to find the optimal segmentation that minimizes the overall coding length. The advantage of this algorithm is canautomatically determine the number of clustering.(2) After analysising theadvantages and disadvantages of the current popular of coclustering algorithms,wecombine coclustering algorithm with the clustering algorithm via lossy datacompression.Our algorithm uses a novel ranking-based objective function that isoptimized to simultaneously?nd large co-clusters with minimum residual errors.Itallows positively and negatively correlated objects to be members of the sameco-clusters and can extract overlapping co-clusters.In addition, the coclusters can bearbitrarily positioned in the data matrix.

Keywords/Search Tags:

gene expression data, clustering, coclustering, lossy datacompression, positive and negative correlation

PDF Full Text Request

Related items

1	Bi-correlation Pattern Discovery Of Gene Expression Based On Bi-clustering Method
2	Study Of Neural Network Ensemble Algorithm And Application Of It In Gene Expression Data Analysis
3	Study Of Gene Expression Data Analysis Based On Pattern Recognition Methods
4	The Research And Implementation On Clustering Algorithm Of Gene Expression Data
5	The Design And Analysis Of Clustering Algorithms On Gene Expression Data
6	Algorithm Of Fast Clustering For Gene Expression Data
7	The Research And Application Of Particle Swarm Optimization Algorithm In Clustering Analysis Of Gene Expression Data
8	The Research And Application On Gene Expression By Clustering Algorithms
9	Research Of Co-clustering Algorithms For Cancer Subtypes Discovery Based On Gene Expression Data
10	Clustering Analysis Based On The Ant System For Gene Expression Data