Study On Semi-superviesd Clustering Incorporating Instance-level And Attribute-level Knowledge And Applications

Posted on:2011-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Wu

Full Text:PDF

GTID:2218330341951106

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As an important data mining technique, clustering can automatically group similar objects while separating different objects. The traditional clustering is based on unsupervised learning mechanism, partitioning objects on the basis of a specific distance or similarity. However, unsupervised learning mechanism makes these clustering algorithms unable to meet user's requirements. The results are usually difficult to be understood with low accuracy and stability. In order to solve these problems, many researchers proposed the semi-supervised clustering algorithms to incorporate a small amount of prior knowledge so as to improve users'satisfaction and the intelligibility of clustering results.As a type of important prior knowledge, pairwise constraints include must-link and cannot-link constraints. Compared to labels, instance-level knowledge in the form of pairwise constraints is consistent with the definition of clustering. In practice, users may provide not only the instance-level knowledge but the attribute-level knowledge. As an effective type of attribute-level knowledge, attribute order preferences stand for the relative importance of a pair of attributes. Through integrating attribute-level knowledge in the form of attribute order preferences, we can make weights of important attributes evident and obtain satisfying results.In practice, clustering can get better results while taking advantage of the two types of prior knowledge. Therefore, this paper studies on integrate instance-level and attribute-level knowledge to assist clustering, and the main work includes:(1) We propose an effective semi-supervised clustering algorithm, incorporating instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of attribute order preferences based on metric learning. Firstly, we construct an optimization problem to learn instance-level and attribute-level knowledge; then, we solve the optimization problem to obtain attribute weights and the new metric; finally, we utilize the new metric to partition the datasets. Experimental results prove effectiveness of our method.(2) We propose an effective semi-supervised clustering framework with instance-level and attribute-level knowledge based on feature selection. We incorporate instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of attribute order preferences in sequence. Firstly, we incorporate pairwise constraints to learn initial weights based on metric learning; then, select features according to initial weights and combine distance-based and constraint-based approaches to learn two types of knowledge. Experiments demonstrate that our framework is better than the semi-supervised clustering algrithms integrating only one type of knowledge (instance-level knowledge or attribute-level knowledge), and the semi-supervised clustering algorithm with instance-level and attribute-level knowledge based on metric learning.(3) We integrate the instance-level knowledge in the form of pairwise constraints and the attribute-level knowledge in the form of important words into document clustering. We combine distance-based and constraint-based approaches so as to incorporate the instance-level and attribute-level knowledge in sequence. Experimental results prove integrating two types of knowledge can effectively assist document clustering.

Keywords/Search Tags:

Semi-supervised clustering, Document clustering, Instance-level knowledge, Attribute-level knowledge, Pairwise constraints, Attribute order preferences

PDF Full Text Request

Related items

1	Semi-supervised Clustering With Constraints Assessment
2	Semi-supervised Clustering Algorithm And Implementation Based On Seeds Set And Pairwise Constraints
3	A Study On Semi-supervised Clustering Algorithm Based On Domain Knowledge
4	Research On Two Clustering Algorithms Based On Semi-Supervised Learning
5	Research On Semi-supervised Clustering Algorithms With Pairwise Constraints
6	Study Of Semi-supervised Fuzzy Clustering Algorithm Based On Pairwise Constraints
7	The Study On Personalized Search Based On Semi-Supervised Clustering
8	Semi-Supervised Clustering Algorithm Based On Pairwise Constraints And Its Parallel Implementation
9	Studies On Semi-Supervised Clustering Algorithms Based On Pairwise Constraints
10	Semi-supervised Subspace Clustering Based On Space-level Constraint