Font Size: a A A

Multi-objective Cluster Ensemble Selection For High Dimensional Data

Posted on:2019-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:W J HuangFull Text:PDF
GTID:2428330566987573Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,there are explosive amount of data.As a result,the clustering problem for high-dimensional data has become increasingly important.The objective of clustering is to group data into different clusters based on similarity,and it plays an important role in many applications such as information retrieval.However,due to the lack of prior knowledge of data,no single clustering algorithm is able to handle all kinds of dataset.Cluster ensemble is proposed to solve the above problems,which generates a set of diverse clustering solutions in the first step.The generation methods mainly include bagging,random projection,different clustering algorithms and so on.A final clustering result is obtained using consensus function afterwards.However,the presence of low-quality results in the ensemble may hurt the performance of final result.Clustering ensemble selection method can effectively select a compact subset of clustering result and improve the accuracy of cluster ensemble.Although traditional cluster ensemble selection methods take the accuracy and diversity into consideration,they cannot make proper tradeoff between them adaptively.The reason behind is that datasets with different characteristics need different treatments.In this paper,we propose four instance stability indices to rank the instances and separate them into easy and hard instance groups,where the size of each group is determined by the proposed dataset stability score.We consider the quality in easy instance group and the diversity in hard instance group as objective functions for adaptive cluster ensemble selection.Most traditional methods use the greedy forward selection strategy to optimize the quality and diversity efficiently,but cannot gain satisfactory result.Multi-objective evolutionary algorithm is an effective method to solve this problem,but there are few researches on using multi-objective evolutionary algorithm for the cluster ensemble selection problem.In this paper,we proposes a multi-objective evolutionary algorithm that adjust the evolutionary direction based on improvement condition.The multi-objective optimization process treat each selector as an individual.Quality in easy instance group and diversity in hard instance group are adopted as objective functions.We conducted a series of experiments on several real word datasets to evaluate the performance of the proposed method.Experimental results demonstrate that our method is able to select a compact and accurate subset of clustering solutions to improve the cluster ensemble performance.
Keywords/Search Tags:Ensemble Learning, Multi-objective Optimization, Evolutionary Algorithm, Machine Learning, Data mining
PDF Full Text Request
Related items