Font Size: a A A

Semi-supervised Clustering Algorithm Based On Label Propagation

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J B WangFull Text:PDF
GTID:2428330626955398Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important research field in data mining,which has been used to explore the potential cluster structure of various data in practical application.Due to the complexity of the data,the clustering results may be irrelevant to users' expectation.Semi-supervised clustering is a very important technique to reduce the gap between the clustering result and users' expectation,which makes use of pre-given knowledge about cluster structure of a data set to guide the clustering process.The paper focuses on how to utilize and propagate the prior information.We make a systematic study for label propagation algorithm and the extended algorithm of pairwise constraints.The main contents of this thesis are summarized as follows:(1)We propose the label propagation algorithm with pairwise constraints,which is an extension of the LP algorithm.In the algorithm,we save the prior information into a pairwise relation matrix and calculate the divergence between the pairwise relations of prior information and clustering result,instead of the divergence between the partition matrices.We build a new optimization model to transform the optimization problem of label propagation algorithm into a spectral clustering problem and use the eigenvalue decomposition method to obtain its optimal solution.The proposed algorithm can not only solve the misalignment problem of the LP algorithm,but also deal with pairwise constraint information.Finally,we compare the proposed algorithm with other eight semi-supervised clustering algorithms on eleven benchmark data sets.The experimental results illustrate that the proposed algorithm is more effective than other algorithms.(2)The number of pairwise constraints is an important factor,which affects the results of semi-supervised clustering.However,in practical applications,the acquisition of pairwise constraints requires a lot of costs.Therefore,we propose the extended algorithm of pairwise constraints based on security.We take the maximum local connected distance in the transitive closures as the safe value.According to the safe value,we modify the similarity between the different transitive closures to reduce the risk of merging transitive closures.Finally,the modularity algorithm is used to merge similar transitive closures to extend the pairwise constraints.We compare the extended algorithm of pairwise constraints on eight benchmark data sets.The experimental results show that the proposed algorithm can extend pairwise constraints safely and effectively.(3)We design and develop a semi-supervised cluster analysis system.This system includes the data importation,the algorithm selection and result display.The system encapsulates the semi-supervised clustering algorithms used in this paper,which can test different types of data sets and prior information.This system has good availability.The above-mentioned contributions have further enriched the research on semi-supervised clustering and provide a new technology support for the studies of label propagation algorithm.
Keywords/Search Tags:Cluster analysis, Semi-supervised clustering, Label propagation algorithm, Pairwise constraints, Spectral clustering
PDF Full Text Request
Related items