Font Size: a A A

Clustering Algorithm Research Of Gene Expression Data

Posted on:2010-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2178360302959086Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The gene chip technology makes researching thousands of gene expression simultaneously was possible. However, how to manage and analyze the mass data which produced by the gene chip experiment effectively has become a bottleneck of using this high-throughput technology. Clustering in Data mining is the key technology of solving this kind of questions, but because of the"high-dimensional small-sample"properties of genetic data and its own characteristic, the traditional clustering algorithms receive many limitations. Therefore, this article has conducted research and improvement to the present clustering algorithms of gene expression data analysis.First of all, there are not effective solution to the problem of cluster number and center initialization. So, this article introduces the Xie-ben index, it can evaluate the quality of clustering results, but also be able to give the best clustering number; and presents a center initialization method which based on the prim algorithm.Secondly, according to this particular field of bioinformatics, the traditional Clustering technology is usually very difficult to deal with it effectively. Therefore, we combined FCM algorithm with Shannon entropy theory, then present a new kind of gene expression data fuzzy clustering method, definite the concept of the associated redundancy value, judge the relationship size between two genes their associated redundancy value, furthermore, group and determine Packet mode, the method can greatly improve the performance of clustering technology in bioinformatics application.Finally, based on the supervised clustering algorithm which is applied to detect the difference of genes expressed among different samples, make the analysis, for deficiencies of support vector machine SVM and decision tree TREE, this article integrated the two methods, then give an improved gene expression data supervised clustering method SVM-TREE, and designed experimental model, verify its good classification performance by comparative analysis.
Keywords/Search Tags:Clustering, Gene Expression Data, FCM, Associated Redundancy Value, SVM-TREE
PDF Full Text Request
Related items