Font Size: a A A

Study On Semi-superviesd Clustering Incorporating Instance-level And Attribute-level Knowledge And Applications

Posted on:2011-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:S Y WuFull Text:PDF
GTID:2218330341951106Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an important data mining technique, clustering can automatically group similar objects while separating different objects. The traditional clustering is based on unsupervised learning mechanism, partitioning objects on the basis of a specific distance or similarity. However, unsupervised learning mechanism makes these clustering algorithms unable to meet user's requirements. The results are usually difficult to be understood with low accuracy and stability. In order to solve these problems, many researchers proposed the semi-supervised clustering algorithms to incorporate a small amount of prior knowledge so as to improve users'satisfaction and the intelligibility of clustering results.As a type of important prior knowledge, pairwise constraints include must-link and cannot-link constraints. Compared to labels, instance-level knowledge in the form of pairwise constraints is consistent with the definition of clustering. In practice, users may provide not only the instance-level knowledge but the attribute-level knowledge. As an effective type of attribute-level knowledge, attribute order preferences stand for the relative importance of a pair of attributes. Through integrating attribute-level knowledge in the form of attribute order preferences, we can make weights of important attributes evident and obtain satisfying results.In practice, clustering can get better results while taking advantage of the two types of prior knowledge. Therefore, this paper studies on integrate instance-level and attribute-level knowledge to assist clustering, and the main work includes:(1) We propose an effective semi-supervised clustering algorithm, incorporating instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of attribute order preferences based on metric learning. Firstly, we construct an optimization problem to learn instance-level and attribute-level knowledge; then, we solve the optimization problem to obtain attribute weights and the new metric; finally, we utilize the new metric to partition the datasets. Experimental results prove effectiveness of our method.(2) We propose an effective semi-supervised clustering framework with instance-level and attribute-level knowledge based on feature selection. We incorporate instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of attribute order preferences in sequence. Firstly, we incorporate pairwise constraints to learn initial weights based on metric learning; then, select features according to initial weights and combine distance-based and constraint-based approaches to learn two types of knowledge. Experiments demonstrate that our framework is better than the semi-supervised clustering algrithms integrating only one type of knowledge (instance-level knowledge or attribute-level knowledge), and the semi-supervised clustering algorithm with instance-level and attribute-level knowledge based on metric learning.(3) We integrate the instance-level knowledge in the form of pairwise constraints and the attribute-level knowledge in the form of important words into document clustering. We combine distance-based and constraint-based approaches so as to incorporate the instance-level and attribute-level knowledge in sequence. Experimental results prove integrating two types of knowledge can effectively assist document clustering.
Keywords/Search Tags:Semi-supervised clustering, Document clustering, Instance-level knowledge, Attribute-level knowledge, Pairwise constraints, Attribute order preferences
PDF Full Text Request
Related items