Font Size: a A A

The Research On Feature Weighted Clustering Algorithm For Gene Expression Data

Posted on:2012-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:R WeiFull Text:PDF
GTID:2178330338990562Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
DNA microarray technology is widely applied in field of bioinformatics and produces vast amounts of gene expression data, which provides an important resource to mining gene patterns and understands gene's function. At the same time, How to analyze the data has become the focus of the bioinformatics research in post-genome era. Clustering plays an important role in gene expression data analysis. Currently, there are several cluster analysis methods which have been applied in gene expression data analysis and have got some achievements. Meanwhile, many problems arise in the process of application. This thesis proposes a way to design clustering algorithm that can be applied to analysis gene expression data.This thesis consists of three parts.In the first part, this thesis proposes a pre-processing algorithm that processes gene expression data set. The main purpose of pre-processing algorithm is solving the problem of initial value sensitivity and parameter dependence existed in FCM algorithm. Pre-processing algorithm consists of two stages. In the first stage, pre-processing algorithm produces several small clusters through sampling. In the second stage, it merges those small clusters based on the physical meaning of entropy, and then, determines the number of classification and representative points. By using pre-processing algorithm to process actual gene expression data sets, experimental result shows that pre-processing algorithm can effectively determine the actual classification number and representative points.In the second part, because of FCM algorithm can not distinguish each different roles of gene expression data's attributes, this thesis proposes a method based on feature weighting to solve this problem. Then, the thesis describes steps of how to obtain feature weight of data set, criterion function of FCM Algorithm when introducing feature weight, calculation of cluster centers and steps of feature weighted FCM Algorithm. Using features weighted FCM algorithm process actual gene expression data sets, experimental results show that feature weighted FCM Algorithm has the advantages in clustering precision.In the third part, for another purpose, this thesis proposes a new feature weighting method. The main purpose of the method is making clustering results have a better biological significance. In this thesis,the principle and steps of how to obtain dataset's attribute weight are described in detail. Meanwhile, proposing feature weighted FCM algorithm based on information entropy. By using features weighted FCM algorithm to process actual gene expression data sets, experimental results show that new algorithm has advantages on clustering precision and biological significance of result.
Keywords/Search Tags:Gene Expression Data, Clustering Analysis, FCM Algorithm, ReliefF, Feature Weighte
PDF Full Text Request
Related items