Research On High Dimensional Data Clustering Algorithm Based On Subspace And Density Peak

Posted on:2019-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:W Tan

Full Text:PDF

GTID:2428330572995097

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology,the explosive growth of data makes it more and more difficult to find valuable information.The original method can achieve good clustering effect on low-dimensional data sets,due to the impact "dimension disaster",the same can not get the desired results in the high-dimensional data sets.Therefore,it is extremely urgent to find more comprehensive clustering methods.This paper mainly studies the clustering algorithms applicable to high-dimensional data sets.First,the background and significance of high-dimensional data clustering in data mining,and the development status of clustering algorithms at home and abroad are introduced.Then,the related knowledge of clustering is introduced.Based on reading a large amount of literature,some improvements of existing algorithms are proposed.The main work is as follows:(1)The advantages and disadvantages of existing clustering algorithms were summarized,especially the CLIQUE algorithm and DPC algorithm.The equal-width meshing in the CLIQUE algorithm may lose some of the clustering points and destroy the integrity of dense areas;and the artificial input density threshold is random,so it is difficult to determine the appropriate threshold.DPC algorithm can only deal with small and medium data sets,and can't distinguish outliers and cluster boundary points.(2)An adaptive high-dimensional subspace clustering algorithm REG-CLIQUE was proposed.A binary tree was combined with relative entropy to perform adaptive meshing,remove the redundant dimension,and improve the clustering accuracy.The formula of the density threshold was proposed,and a suitable value was recursively obtained,which greatly reduced the priori of the algorithm.Results showed that REG-CLIQUE algorithm can achieve adaptive clustering,and the clustering time and accuracy are better than GP-CLIQUE algorithm and CLIQUE algorithm.(3)An improved density peak clustering algorithm SREDPC was proposed.Sampling high-dimensional large data sets.Residual squares was used to provide a better decision graph than the DPC algorithm to determine cluster center;the outliers and the boundary points belonging to the cluster clusters are distinguished by the halo recognition.Results showed that the improved algorithm can be applied to high-dimensional large data sets,and it is also superior to the original DPC algorithm in both time complexity and clustering results.

Keywords/Search Tags:

Big data, Clustering, Subspace, Adaptive, Density peak

PDF Full Text Request

Related items

1	Research On Adaptive Density Peak Clustering Algorithm
2	Research On Target Detection Method Based On Density Peak Clustering
3	Research On The Grid Density Peak Clustering Algorithm
4	Research And Application Of Density Peak Clustering Algorithm Based On Spark Framework
5	Research And Application Of Financial Big Data Based On Density Peak Clustering Of K Near Neighbors
6	Multi-Granular Big Data Analytics Based On Density Peak
7	Research On Improved Density Peak Clustering Algorithm
8	Research And Application Of Clustering Algorithm Based On Density Peak
9	Research On Application And Optimization Of Density Peak Clustering
10	Research On Density-Based Subspace Clustering Algorithm For Data Streams