Semi-supervised Clustering Ensemble For Bio-molecular Pattern Mining

Posted on:2017-09-27

Degree:Master

Type:Thesis

Country:China

Candidate:H S Chen

Full Text:PDF

GTID:2334330536953095

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The development of DNA microarray techniques allows us monitor the expression levels of thousands of genes of a cancerous cell at the same time,which brings a new opportunity for people to better understand and cure for the cancer.Clustering is one of the most important tools for analyzing tumor gene expression profiles.Through clustering we are able to discover underlying tumor sub-classes,which provides reliable theoretical foundations for the diagnosis and further treatment of cancer.Current existing tumor clustering methods have several limitaions.such as 1)they do not consider incorporating expert knowledge to guide the process of clustering to enhance the accuracy of clustering results.2)few of them are able to cope with a large number of irrelevant and noisy genes in high dimensional data or leverage domain knowledge to choose relevant genes 3)single clustering results are not capable of revealing the underlying structure of data.In order to cop with these limitations,this paper proposed two semi-supervised clustering ensemble frame works.Firstly,the double feature selection based semi-supervised clustering ensemble(MDS-SSCE).The main contributions of MDS-SSCE are:l)the integration of expert knowledge in the form of pairwise constrains to guide clustering procedure 2)the use of feature selection techniques to remove irrelevant and noisy genes 3)it adopts feature selection to perform cluster solution selectioin.Secondly,the adaptive random subspace based semi-supervised clustering ensemble(ARSEMICE).The main contributions of ARSEMICE are three-fold:1)the adoption of random subspace to reduce the side-effects of noise features 2)it performs transitive closure on pairwise constraints to make full use of supervision and obtain a more accurate result 3)it employs an adaptive procedure that peeks an optimal subset of the random subspaces.In experiment section,the effectiveness and efficiency of our proposed frameworks are evaluated using several gene expression datasets.The results show that our algorithm can achieve more accurate and robust results than other state-of-the-art algorithms on high dimensional datasets.

Keywords/Search Tags:

Feature selection, Semi-supervise clustering, Clustering Ensemble, Evolutionary computation

PDF Full Text Request

Related items

1	Research On Multi-center Autism Data Clustering Algorithm And Its Application In Disease Classification
2	Dimensionality Reduction And Clustering Ensemble Of Tumor Gene Expression Profile
3	Evolutionary Clustering Algorithm And Its Application To Medical Data Analysis
4	Research Of Clustering Strategies For Dynamic Electrocardiogram Waveform
5	Analysis And Research On Erythrocyte Morphology Based On Clustering Algorithm
6	Non-negative Matrix Factorization Based Clustering Research For Cancer Gene Expression Data
7	Research And Implementation Of Aided Diagnosis System For Infectious Liver Disease Based On Ensemble Learning
8	Multi-label Feature Selection Algorithm Based On Sample Differences
9	Feature selection for clustering of functional magnetic resonance imaging data
10	Multi-task Feature Selection Algorithm And Its Application For Multimodal Neuro Image