Font Size: a A A

The Research On Soft-hard Mixed Clustering And Its Ensemble

Posted on:2014-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:S F ChenFull Text:PDF
GTID:2248330398479443Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent decades,social have made great progress with the development of scientific and technological, at the same time various fields have made a large amount of data. In1989, KDD (Knowledge discovery in databases) was firstly put forward at the Eleventh International Joint Conference for Artificial Intelligence. Then, this discipline has been the concern of researchers from various fields and produced interdisciplinary, resulting Data Mining. Now,Data Mining mainly studies association rules, classification, clustering analysis, forecasting, Web mining and so on. Clustering analysis refers to group the data objects for a plurality of clusters, so that the similarity between the objects is the highest in the same class clusters, the similarity between the objects is the lowest in different class clusters. The process of clustering is dividing the not training samples into different meaningful class clusters, which belongs to unsupervised learning. There are many types of clustering algorithms in the current. According to the clustering rule, the clustering algorithms can be roughly divided into five categories. Each clustering algorithm has better performance than others in a certain scope of application and on certain data sets. But, there is no clustering algorithm can be used to reveal a variety of multidimensional data set, which consist of variety structures. In general, there are the following research problems in clustering algorithms:such as scalability; clustering algorithm is suitable for working on the small data set,but not suitable for working on the large data set, and clustering algorithm needs prior knowledge to determine the input parameters, for example K-means algorithm needs to enter the number of categories K; some algorithms cannot to distinguished any shape clusters; some lack of validity study on the attribute data.Ensemble Learning uses multiple base learners to solve the same problem, which can significantly improve the generalization ability of the learning system. Based on this, Strehl put forward Cluster Ensemble and gave the definition. Cluster Ensemble is to use multiple base clustering results integrated to get a new division, which shares all the input based clustering results in maximum extent. There are many many types of Cluster Ensemble algorithms, which can be broadly divided into three categories. And Cluster Ensemble has better generalization ability and is able to dig out the underlying structure of the data set.From the results of the clustering, the cluster algorithms can be divided into two categories:soft clustering and hard clustering. The soft clustering uses membership to identify the affiliation between the sample and the clusters. The hard clustering divided the samples to a particular class of cluster without relationship between the sample and other clusters. From the perspective of the mathematics model, the soft clustering based on fuzzy mathematics. This article firstly studies clustering algorithms. Then, it further study the combination of soft clustering and hard clustering algorithm, which uses fuzzy similar matrix to divided the fuzzy samples and the common samples. Next, it uses soft and hard clustering methods to each type sample. In the experiment, it compares the combination of soft and hard clustering algorithm with K-means and FCM algorithm. The final experiment results shows the combination of soft and hard clustering algorithm is superior to the simple soft clustering FCM algorithm and hard clustering K-means algorithm. In Cluster Ensemble, this article puts forward selecting ensemble based on similarity. This algorithm proposed two new cluster membership similarity measures, then it uses the two kinds of similarity measures to select base clustering results. Last,it uses the selected base clustering results to integrated and get the clustering divided about the fuzzy samples and the common samples. In the experiment, the results shows that two similarity measures can select good base clustering results. And in UCI datasets, this cluster ensemble algorithm is better than the common algorithm.
Keywords/Search Tags:Clustering, Cluster Ensemble, Soft Clustering, Hard Clustering, FuzzyMathematics
PDF Full Text Request
Related items