Font Size: a A A

Semi-supervised Clustering Ensemble For Bio-molecular Pattern Mining

Posted on:2017-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:H S ChenFull Text:PDF
GTID:2334330536953095Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The development of DNA microarray techniques allows us monitor the expression levels of thousands of genes of a cancerous cell at the same time,which brings a new opportunity for people to better understand and cure for the cancer.Clustering is one of the most important tools for analyzing tumor gene expression profiles.Through clustering we are able to discover underlying tumor sub-classes,which provides reliable theoretical foundations for the diagnosis and further treatment of cancer.Current existing tumor clustering methods have several limitaions.such as 1)they do not consider incorporating expert knowledge to guide the process of clustering to enhance the accuracy of clustering results.2)few of them are able to cope with a large number of irrelevant and noisy genes in high dimensional data or leverage domain knowledge to choose relevant genes 3)single clustering results are not capable of revealing the underlying structure of data.In order to cop with these limitations,this paper proposed two semi-supervised clustering ensemble frame works.Firstly,the double feature selection based semi-supervised clustering ensemble(MDS-SSCE).The main contributions of MDS-SSCE are:l)the integration of expert knowledge in the form of pairwise constrains to guide clustering procedure 2)the use of feature selection techniques to remove irrelevant and noisy genes 3)it adopts feature selection to perform cluster solution selectioin.Secondly,the adaptive random subspace based semi-supervised clustering ensemble(ARSEMICE).The main contributions of ARSEMICE are three-fold:1)the adoption of random subspace to reduce the side-effects of noise features 2)it performs transitive closure on pairwise constraints to make full use of supervision and obtain a more accurate result 3)it employs an adaptive procedure that peeks an optimal subset of the random subspaces.In experiment section,the effectiveness and efficiency of our proposed frameworks are evaluated using several gene expression datasets.The results show that our algorithm can achieve more accurate and robust results than other state-of-the-art algorithms on high dimensional datasets.
Keywords/Search Tags:Feature selection, Semi-supervise clustering, Clustering Ensemble, Evolutionary computation
PDF Full Text Request
Related items