Font Size: a A A

Research On Possibilistic Mean Clustering Algorithm Based On Isolation Similarity

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:T Z ChenFull Text:PDF
GTID:2518306494953799Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,a large number of data are produced and stored in people's production and life.The emergence of data mining technology can meet people's s needs to excavate potential information knowledge from data and make data play its maximum value.Clustering algorithm plays an important role in data mining.However,when dealing with massive,irregularly distributed,and high-dimensional data,the traditional clustering algorithm will cause high computational complexity and low clustering efficiency.This dissertation studies the above problems and proposes three clustering algorithms.The main research works of this dissertation are described as follows:(1)Distribution-supervised possibilistic C-means and its extended incremental clustering.For a large number of data,this dissertation analyses the shortcomings of the fullyunsupervised possibilistic C-means clustering algorithm,on this basis,proposes the distribution-supervised possibilistic C-means clustering algorithm,and extends it to the incremental algorithm to deal with large amounts of data.By mapping the cluster centers of the old and new samples to the distinguishing space,it identifies new categories and distributions,to obtain the cluster centers that are difficult to identify and find new cluster centers and categories.The algorithm is verified on 11 popular UCI datasets and 4 synthetic datasets,and the experimental results prove the effectiveness and robustness of our algorithm.(2)Possibilistic C-means clustering based on nearest-neighbour-induced isolation similarity.For irregularly distributed data,this dissertation proposes a possibilistic C-means clustering algorithm based on nearest-neighbour-induced isolation similarity.Firstly,Our algorithm takes all samples as the initial clustering centers and obtains k sub-clusters after several iterations,and then it selects the first(41)samples farthest from each sub-cluster center to represent the sub-clusters.Then,sub-clusters are mapped to the distinguishable space,and these samples are used to calculate the nearest-neighbour-induced isolation similarity of sub-clusters.Finally,according to the proposed merging strategy,the adjacent sub-clusters are merged to obtain C clusters.In this dissertation,our algorithm has been tested on 15 UCI benchmark datasets and 1 synthetic dataset.The experimental results show that the algorithm is suitable for the clustering of non-cluster distributed data,the clustering quality is better than the comparison method,and has strong robustness.(3)Deep embedding possibilistic mean clustering.For high-dimensional data,this dissertation proposes a novel deep embedding possibilistic mean clustering algorithm.Firstly,the data is compressed into a lower dimension by using a pre-trained autoencoder.Secondly,the nearest-neighbour-induced isolation similarity embedding K-Multiple-Means clustering algorithm is used as the initialization method of the clustering layer to train,and then the data feature is converted into the clustering pseudo label probability by using the probabilistic distribution.To effectively improve the clustering effect,we use KL divergence to measure the difference between the improved possibilistic distribution and the auxiliary distribution and minimize the KL divergence.Iterative training makes the membership relationship between the cluster center and the data in the new feature space as close as possible to the real cluster distribution.In this dissertation,our algorithm is verified on three image datasets and one text dataset,and the experimental results are discussed and analyzed in detail.The experimental results show that the clustering performance of our algorithm on the experimental datasets is better than that of the comparison algorithm.
Keywords/Search Tags:Possibilistic Clustering, Incremental Learning, Deep Clustering, Isolation Similarity
PDF Full Text Request
Related items