Font Size: a A A

Research On Two Clustering Algorithms Based On Semi-Supervised Learning

Posted on:2012-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:S H XiongFull Text:PDF
GTID:2218330368979461Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Supervised learning and unsupervised learning are two frequently-used learning methods in the field of machine learning. In supervised learning, a large number of labeled data are taken as prior knowledge to construct a model which is used to predict the unlabeled data. Unsupervised learning is always absence of any prior knowledge to analyze the data and complete clustering. In fact, we often face the shortage of labeled data in many practical applications, or we must spend a lot of human resources, material resources and time labeling the data. Semi-supervised learning algorithm combines the advantages of traditional learning algorithms, and uses a small amount of "expensive" labeled data and "cheap" unlabeled data which are taken as prior knowledge to guide learning process. Compared with traditional machine learning algorithms, semi-supervised learning can achieve good learning effect. Therefore, it has a great significance in theoretical research and practical application.In this paper, we mainly introduce two semi-supervised clustering algorithms which are extend from the classical clustering algorithm.In many practical problems, there are less available set of pairwise constraints. Therefore, we consider adopting the inherent spatial structure of dataset to extend the pairwise constraints. Three methods are proposed to extend the pairwise constraints in this research. Firstly, the two-valued transitive relations of the pairwise constraints are used to extend the pairwise constraints. Secondly, based on two hypotheses of semi-supervised learning, we replace the traditional Euclidean distance with the manifold density-sensitive distance, and then extend the pairwise constraints on the basis of the manifold density-sensitive distance. Thirdly, we propose active learning strategies to complete the expansion of the data pairwise constraints. The aim is to find the representative pairwise constraints which can play an active role in clustering.We integrate the extended set of pairwise constraints into dimensionality reduction and clustering, meanwhile, propose a semi-supervised clustering algorithm for high dimensional data which can project the data onto a low-dimensional manifold, and then pairwise constraints based K-means algorithm is simultaneously used to cluster the data. The proposed algorithm can not only deal with the high dimensional data and reduce the complexity of calculation of semi-supervised clustering algorithm, but also can solve the violation of the pairwise constraints in the process of clustering data and enhance clustering results effectively.The clustering method based on the central division is not available to the. data of multi-scale and the arbitrary special shape. In addition, AP algorithm tends to produce more local clustering. Therefore, semi-supervised affinity propagation clustering based on space consistency is proposed specially for multi-scale as well as arbitrary shape in this paper. This method firstly adjusts the similarity matrix to construct a sparse similarity matrix-by using the extended pairwise constraints. Then, it finishes the manifold searching in the space of the data to distinguish the different distributions of the spatial data. According to the different distributions, a measure method is devised to describe the characteristics of data. For the global distribution, the distance of data points is changed by the function; for the local distribution, a super-ultra-spherical or ellipsoidal shape is transformed in the same distribution. The simulation experiment shows that the semi-supervised affinity propagation clustering method can achieve better clustering effect than the traditional AP algorithm and other classical center-based clustering algorithms.
Keywords/Search Tags:semi-supervised learning, pairwise constraints, closure, prior knowledge, clustering, affinity propagation
PDF Full Text Request
Related items