Font Size: a A A

Research On Performance Optimization And Parameter Selection Of Density Clustering Algorithm

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhuFull Text:PDF
GTID:2428330611463425Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
DBSCAN is a representative algorithm of density clustering,which is the research hotspot of current researchers.In this paper,DBSCAN algorithm is studied deeply,and its performance and parameter selection are improved.This paper mainly includes the following points:(1)DBSCAN,the representative algorithm of density clustering algorithm,is the focus of this paper.Clustering includes many kinds of algorithms.At present,no algorithm can make perfect clustering for all kinds of data sets.Each algorithm has its own advantages and applicable data types,but also has its own shortcomings.Density clustering is one of the clustering algorithms,and it also contains many algorithms.In this paper,DBSCAN,a representative algorithm of density clustering,is deeply analyzed.The basic principle and algorithm design flow of DBSCAN algorithm are described.The relevant improvements of DBSCAN algorithm in recent years are analyzed.Based on the existing research,the relevant solutions for algorithm performance and parameter selection are proposed.(2)In view of the low efficiency of density clustering algorithm DBSCAN for large data sets,a fast grid density clustering algorithm based on square neighborhood is proposed.Firstly,the definition of square neighborhood density clustering is given,and square neighborhood is used to replace circular neighborhood without distance calculation,which greatly reduces the time complexity of the algorithm;secondly,grid of square neighborhood density clustering is proposed The concept enables the core points in the high-density area to be quickly determined,and the density relationship between data points to be quickly determined.By using the density relationship,the number of traversal data sets is greatly reduced,and the clustering efficiency is significantly improved.Finally,grid density cluster is proposed,and the rapid formation of density cluster makes use of the relationship between grids.16 data sets are used to test the algorithm,and all aspects of the experiment are compared with the algorithms in the existing literature.The comparison results show that the algorithm in this paper has a significant improvement in clustering efficiency.The larger the amount of data is,the more obvious the efficiency of the algorithm is,and multi-dimensional data clustering is also applicable to this algorithm.(3)In view of the shortcomings of the two parameters EPS and minpts in DBSCAN algorithm,which are usually selected by experience,a fast parameter selection algorithm of DBSCAN based on the combination of high-order difference and grid generation is proposed.Firstly,the relationship between data points and parameters in data set is analyzed,and EPS and minpts are acquired automatically by high-order difference algorithm Two parameters;then use the grid division to establish the grid index for the data points in the data set,avoid redundant data set traversal through the grid index,and optimize the operation efficiency of the algorithm;finally,for the data set with too many noise points,propose the depolarization operation to enhance the robustness of the algorithm.The algorithm is applied to nine data sets,such as flame,and compared with the parameters selected by traditional DBSCAN algorithm and agd-dbscan algorithm,to analyze the clustering effect and operation efficiency of the algorithm.The results show that the algorithm based on high-order difference is an effective method for automatic parameter selection of DBSCAN,and grid division significantly improves the performance of high-order difference algorithm.Depolarization operation is necessary and effective.The algorithm in this paper has good practicability.
Keywords/Search Tags:clustering analysis, density-based cluster, square neighborhood, grid, grid-based cluster, parameters selection, high order difference, grid partition, eliminate extremes
PDF Full Text Request
Related items