Research On Semi-supervised Clustering Ensemble Based On Soft-voting

Posted on:2015-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:H S Wang

Full Text:PDF

GTID:2268330428476089

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of the most widely used techniques in data mining. The principle is that firstly clusters all of data objects, and then analyzes the results to find implied information with practical value. Clustering divides the large and confusion data objects into several clusters based on the similarity degree of all data objects with the purposes of "data objects within the same cluster with the maximum similarity, data objects in different data clusters with the minimum similarity". The clustering ensemble is a process that uses the results of different clustering algorithms or the same algorithm many times with different parameters setting as based clustering results, selects an appropriate consistency function to integrate all the based clustering results, and then obtains a new clustering result. Clustering belongs to unsupervised learning methods, and semi-supervised clustering is the methods by adding a small number of priori knowledge, known as semi-supervised information, into the process of clustering to improve clustering performance. Semi-supervised clustering ensemble combines both the advantages of semi-supervised clustering and clustering ensemble by the semi-supervised information to guide clustering ensemble to obtain a better result.Depending on the way that objects are assigned to clusters, clustering methods are generally divided into two kinds:hard clustering and soft clustering. The result of hard clustering is a group of cluster labels, which means one data object only belongs to one cluster. The result of soft clustering is a matrix of membership degrees, which means every data object may belong to any cluster with different membership degrees. Some scholars have already proved that the result of soft clustering is better than hard clustering in some respects. Traditional ensemble algorithms are usually used the results of hard clustering as input, in order to solve an ensemble formed of soft clustering using one traditional ensemble algorithm we have to "harden" the soft clustering results, and this process results in the loss of some valuable information. To solve such problem, this thesis proposes a new ensemble approach for soft clustering results, which is called soft Soft-Voting Clustering Ensemble. This algorithm has better flexibility and generalization, and experiments show this algorithm obtains better clustering results.To further improve the performance of the Soft-Voting algorithm, the thesis also attempts to use semi-supervised information to guide the clustering ensemble process. In this thesis, the semi-supervised information is represented by two forms:pairwise constraints and cluster labels, and two corresponding semi-supervised Soft-Voting clustering ensemble algorithms are designed. Experimental results show that the two forms of semi-supervised information both improve the accuracy of clustering results to a certain extent.

Keywords/Search Tags:

Clustering analysis, soft clustering, soft-voting, semi-supervised soft-voting

PDF Full Text Request

Related items

1	Research On Soft Voting Clustering Ensemble And Its Parallel Implementation
2	Semi-Supervised Clustering Analysis And Its Extended Research
3	Research On Theory And Technology Of Semi-Supervised Clustering Ensemble
4	Based On NIOS â…¡ Soft-Core Fault-Tolerant And Voting System Design And Implementation
5	Clustering Analysis Based On Soft-DTW Distance And Its Applications In A-share Market
6	Research On Semi-supervised Clustering Ensemble Approach And Its Application
7	Research On Subspace Clustering Algorithm Guided By Soft Labels
8	Co-training Based Semi-supervised Soft Sensor Modeling
9	Muti-Model Modeling Method And Its Applications In Soft Sensor
10	Research And Implementation Of Clustering Ensemble Algorithm Basing On Voting Strategy