Font Size: a A A

Study On Techniques Of MC-pattern Based Sequence Projection Clustering

Posted on:2015-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:G C TianFull Text:PDF
GTID:2308330482456332Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an important high-dimensional clustering analysis techniques, projection clustering, unlike subspace clustering. Projection clustering require a set of mutually exclusive objects divided, it’s not allowed to share the same object in different clusters. As the result of the projection clustering have good differentiate features, projection clustering began to be applied in gene expression data analysis, which used to distinguish the specific phenotype. However, Most of the existing methods address the chicken-and-egg problem by an iterative adjustment framework. However, they often suffer from some drawbacks:(1) the sensitive adjustment order, (2) the unreasonable gene independence assumption, and (3) the over many selected genes of low discriminability.In this theis, we develop a novel framework, namely MCPC. Unlike the previous work, it is non-iterative and exploits the Structural ordering information among genes. Thus, it need not to worry about the adjustment order sensitivity problem and is not bound by the gene independence assumption. Further, due to exploiting more information missed by the previous work, it improves the accuracy of class discovery by much less genes.This algorithm based on the concept of projection divergence and the (k,l)-availability calculate the ability to representative discriminative subsequence. Then, according to the highest score of the representative discriminative subsequence, cluster the sample and find the Diagnostic gene. This algorithm includes three parts below:(1) the microarray data are transformed into g*-sequence model, and use the data Structure of location matrix to save. (2) we find the maximal distinguishing subsequence for each sample using the template-driven method to enumerate subsequence, the problem can be formalized as a novel MC-subsequence mining problem, in the search process, the algorithm have many effective pruning rules. (3) we divide the samples into blocks according to the maximum distinguishing subsequence. The number of blocks is far more than K which is user-specified. Finally we cluster these blocks into K classes, while we discover the diagnostic genes.Large of experiments show that MCPC is more effectively improve the accuracy and the efficiency of the class discovery than the current methods and can discover the diagnostic genes.the result is very meaningful in biology and statistics.
Keywords/Search Tags:Sequence model mining, Projection clustering, MC-subsequence, Diagnostic gene, Equivalent dimensional group
PDF Full Text Request
Related items