Font Size: a A A

Research On Semi-Supervised Clustering Ensemble Model

Posted on:2013-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:X H NiFull Text:PDF
GTID:2248330371494728Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the most important technology in the machine learning area, Clustering has been widely used in different areas to solve practical problems. According to the similarity between objects, clustering algorithm can divide the data set with unknown distribution into different clusters and the clustering results should follow the principle that the algorithm should maximize the similarity between intra-class objects and minimize the similarity between inter-class objects. However, most of the clustering algorithms are unsupervised learning method, they are unable to use the priori knowledge effectively; In addition, a single clustering algorithm is difficult to calculate the actual distribution structure of samples because of the complexity of data structure and the diversity of optimal criteria in clustering. To improve the stability of the clustering algorithm, some scholars have proposed clustering ensemble technology, but the traditional clustering ensemble method cannot use the background knowledge to guide the integration of clusterings. To improve the performance of the clustering ensemble, semi-supervised clustering ensemble technology emerges as the times require.The incorporation of background knowledge in clustering has recently won considerable attention in the clustering domain. Particularly, instance-based Must-Link and Cannot-Link constraints are two forms of background knowledge famously used in semi-supervised clustering. While the inclusion of instance-based constraints has shown potential to improve clustering accuracy, the quality and quantity of constraint sets will often dictate the level of improvement attained. In this work we demonstrate that it is possible to step-up the quality and quantity of constraint sets by adopting a method that combines automated and active constraint selection. This method capitalizes on the cluster feature that most of the data objects in the cluster are actually core points and just small parts are border points. Therefore, considering automatic constraint selection method between core points and active constraint selection method between border points can add more effective information. Our experiments show that our approach provides a competitive edge in identifying informative constraints that would enhance the accuracy of clustering solutions.Clustering ensemble technology can effectively improve the performance to get more accurate and stable partitions of clustering by combining the diversity clustering results. The traditional clustering ensemble method does not make use of prior knowledge to guide the process well, in this paper I proposed a semi-supervised clustering ensemble method based on the Mixture Model. In the iterative process of EM in mixtrue model, we add a littleclass label to optimize the calculation method and then get a better performance. The experimental comparison results show that my method can get a better performance and quality than mixture model ensemble method without considering class labels and other clustering ensemble alogrithms.
Keywords/Search Tags:Data Mining, Clustering, Constraint Selection, Mixture Model, Semi-supervised Clustering Ensemble
PDF Full Text Request
Related items