Clustering Analysis Algorithm Applied In Analysis Of Gene Expression Data

Posted on:2013-06-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Sun

Full Text:PDF

GTID:2248330362971848

Subject:Computer application technology

Abstract/Summary:

With the development of human genome project, tens of thousands of genes andmassive growth gene sequence data are derived. But the data does not equal informationknowledge, is the source of information knowledge. How to get useful knowledge from thelarge amount of gene expression data using automatic analysis tools so the data analysismethods and tools have been paid more and more attention. Data mining technology hasbeen widely applied to gene expression profiling in many aspects, and achievedconsiderable successes. Data mining extracts useful information knowledge from a largenumber of practical applications of database, which is the hidden, unknown and potential.As a new technology, data mining provides an effective method and tool to analyze data forbiologists and a powerful means of gene expression data analysis. Methods and tools of datamining include the classification and prediction, clustering analysis, association analysis,sequence analysis and time analysis, outlier analysis etc.As a kind of effective data analysis tools, cluster analysis has been widely applied inimage processing, information retrieval, data mining and other fields. The huge amount ofgene expression data is one of the most main reasons of using clustering algorithm toanalyze the gene expression data, but also with a relatively small number of genes of knownfunction in biology. Cluster analysis is a group of samples according to their degree ofsimilarity between into several subclasses, whose basic idea is to identify groups of thesame kind, make the body the smallest difference, and different kinds of the biggestdifference.This paper introduces two parameters of the clustering algorithm similarity measurecriterion, which are Euclidean distance and Pearson correlation coefficient and put forwarda kind of proportional similarity measure, at the same time introduces two kinds ofclustering validity evaluation, the external and internal identified. In this paper, three classicalgorithms are hierarchical clustering, K_means clustering, self-organizing maps clustering.Based on the kind of similarity criterion, Hierarchical clustering is divided into fourdifferent connection clusters, and then in two kinds of similarity of four hierarchicalclustering discusses clustering validity comparison. In Euclidean distance and differentexperimental iterations, K_means clustering, self-organizing maps has correct rates of geneclustering and the better of clustering validity. Compared the advantages and disadvantagesof three algorithms, the paper proposes an improved algorithm based on hierarchicalclustering and self-organizing maps clustering, according to the experimental data, the K_means clustering, self-organizing maps has correct rates of gene clustering and thebetter of clustering validity. Compared the advantages and disadvantages of threealgorithms, the paper proposes an improved algorithm based on hierarchical clustering andself-organizing maps clustering, according to the experimental data, the improved algorithmextent overcomes the original defects of the method in some degree and embodies theadvantages itself.

Keywords/Search Tags:

gene expression data, data mining, clustering analysis, validity

Related items

1	Association Rules Mining And Its Applications In Microarray Gene Expression Data
2	Study Of Gene Expression Data Analysis Based On Pattern Recognition Methods
3	The Research And Application Of Particle Swarm Optimization Algorithm In Clustering Analysis Of Gene Expression Data
4	The Research And Implementation On Clustering Algorithm Of Gene Expression Data
5	Research On Clustering Algorithms In Gene Expression Data Analyzing
6	Clustering Analysis Based On The Ant System For Gene Expression Data
7	The Design And Analysis Of Clustering Algorithms On Gene Expression Data
8	Study On Some Data Mining Methods For Gene Expression Data
9	Analysis Of Gene Expression Data Clustering Algorithm
10	Construction Of Gene Expression Data Mining Models