Font Size: a A A

The Approach To Mining Time-lagged Coregulated Gene And Research On Fuzzy Clustering Algorithm

Posted on:2014-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:J XueFull Text:PDF
GTID:2308330461972570Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Gene microarray technology is widely applied in field of bioinformatics and produces vast amounts of gene expression data,which provide an important resource to mining gene patterns and understands gene’s function.Data mining researches are proposed to solve this problem,which combines statistics, Artificial Intelligence,machine learning techniques etc.As a non-supervised learning method,Clustering analysis commonly is a part of important data mining.It is also an important method of data partition and grouping.The purpose of clustering is to partition a data set without any category tags into several clusters according to some similarity measure.The traditional fast clustering algorithms is sensitive to the initial cluster centers and noises and is prone to converge to local minimum values,leading to low clustering accuracy.In order to overcome the influence of noise feature vector of clustering,take full account of the different contribution of feature vector on clustering,a new fuzzy clustering algorithm with dynamic weights based on the kernel method is presented.This new fuzzy clustering algorithm can reflect properly the attribute importance for each feature vector and hence can yield much higher clustering accuracy than the conventional clustering algorithms.The changes in gene expression may often be associated with changes in gene function.It is believed that clusters of gene expression patterns help to identify co-expressed genes and its regulations.The static similarity measure which is based on the distance or the correlation coefficient holds that gene expression sequences shall have the same change trend in the same time.But the time-laged regulation between co-expression genes is a common phenomenon.So the conventional algorithms are not effective to measure the time-lagged relationship between genes.In order to mine the time-lagged gene,this paper propose a new similarity measure which is based on the DTW algorithm. Through introducing sliding window algorithm and maximum bending Threshold value,the new similarity measure can effectively deal with co-expression genes which have complex time delay characteristic. Meanwhile, This paper statistically analyze the new similarity measure by integrating it with the Affinity propagation clustering of given number of clusters.Additionally, experiments on synthetic dataset and real gene expression dataset show that the proposed algorithm can correctly choose potential regulation genes,and thereby more accurately than other algortithms.
Keywords/Search Tags:gene expression data, fuzzy clustering, similarity measure, Affinity propagation clustering, time lag
PDF Full Text Request
Related items