Font Size: a A A

Microarray Data Clustering Algorithm

Posted on:2007-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2208360182994898Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the Human Genome Project developing, the research on the genes' function and every gene in genome goes to in-depth gradually. Analyzing the expression of genes in different times and conditions is the main approach to find out functions of genes. cDNA micro-array technique is an important tool for biologists to know about gene. Lots of gene expression data generated from micro-array technique, it's essential to adopt data mining technique to extract valuable information from these data.Genes with similar function have similar expression. The unknown genes' functions can be forecasted by analyzing genes with similar expression. Clustering algorithm which partitions data according to their similarity realizes that things of one kind come together. Gene expression data are dealt with clustering technique, genes with similar expression can be clustered into the same group. It's helpful for biologists to find out gene function and inheritance pattern.Most of the clustering algorithms which have been imported into analyze the gene expression data origin from non-biological fields. There exists some shortcoming in the application. For example K-means and Self organize maps need user input the number of clusters which was hard to been estimated before the clustering process, the final result will be influenced seriously when changed the parameter. Many traditional clustering algorithms are sensitive to noise data like hierachical clustering. In the end, the traditional algorithms origin from non-biological fields, so the clusters don't include precise biological meanings. For addressing these shortcoming The K nearest neighbors absorbed firstly idea and some knew biological meanings are introduced into the algorithm which based density, a novel K nearest neighbors absorbed firstly clustering algorithm is devised and implemente in this paper. And this algorithm was proposed to analyze a yeast cell cycle dataset. Comparing the results of K nearest neighbors absorbed firstly clustering algorithm and k-means shows K nearest neighbors absorbed firstly clustering algorithm provides more useful information than K-means, whether in the structure of clusters or biological meanings.
Keywords/Search Tags:Microarray, gene expression data, clustering, K nearest neighbor, based on density
PDF Full Text Request
Related items