Font Size: a A A

Semi-supervised Clustering With Constraints Assessment

Posted on:2014-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y QiuFull Text:PDF
GTID:2308330461972601Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining is a process which discovers implicit, previously unknown and potentially valuable patterns and knowledge from a large amount of data. As a basic method of data mining, clustering analysis partitions a data set into several clusters by utilizing the similarity between objects. From the perspective of machine learning, clustering analysis is an unsupervised learning method. However, in real applications, people can often obtain some domain knowledge related to the data. By using these domain knowledge, people can achieve some instance-level constraint information. The process that incorporate some prior knowledge into clustering for improving the clustering result is known as semi-supervised clustering.In recent years, the most popular semi-supervised clustering algorithms are constrained semi-supervised clustering ones. In this kind of clustering algorithms, priori knowledge is represented as the instance-level constraint. A must-link constraint between two objects represent that they must be classified into the same cluster, while a cannot-link constraint, just the opposite. There has been lots of work demonstrating that incorporating constraints into clustering can effectively improve the clustering result. However, there exist some problems after joining the constraints. For example, the performance of Cop-Kmeans (CKM) algorithm is affected directly by the assignment order of instances. In view of the above problem, UALA determines the assignment order of instances via assessing the stability of instances. The experiment results show that UALA improves the performance of CKM. But, it also has some drawbacks.In this paper, we evaluate the constraints from stability of instances and conformity of constraints and change the assignment order of instances in iteration process for improving clustering performance. By analyzing Cop-Kmeans algorithm, we find that stability of instances is changing with the joined constraints. Therefore, a Clustering Uncertainty based assignment order Iterative Learning Algorithm (UAILA) was proposed to improve conformity between statically computing of stability and dynamic nature of stability. UAILA dynamically computes the stability via iterative computation, which is used to preferably confirm assignment order of instances, and then to acquire a group of instances assignment order more satisfactory to CKM. The experimental results show that UAILA has better clustering performance compared with UALA and CKM.Theoretically, the more constraints, the better performance. However, the experimental results show that it is not the case and an algorithm with only a group of assignment order is easy to trap into the local optimum. To solve the above two problems, a novel algorithm (Dynamic Assignment Order CKM——DAO-CKM) is proposed using several groups of assignment orders based on the conformity of constraints. By evaluating the conformity of constraints, the algorithm first gets the correlation information between constraints and clustering results, and uses the roulette strategy to choose several sets of assignment order, and then utilizes the assignment order to expand the search scope. Finally it gets a group of best clustering result according to the predefined assessment criteria. The experiment proves the effectiveness and superiority of DAO-CKM algorithm.
Keywords/Search Tags:Data Mining, machine learning, Semi-supervised clustering, assignment order, Instance-level Constraints
PDF Full Text Request
Related items