Font Size: a A A

Research On Clustering Method Based On Improved DBSCAN

Posted on:2024-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhangFull Text:PDF
GTID:2568307064997249Subject:Engineering
Abstract/Summary:PDF Full Text Request
Now we are in an era of digitalization,informatization and networking.With the rapid development of science and technology,there are thousands of data generated in different fields and industries.In the face of massive and complex data,how to convert it into understandable and usable information is particularly important in the current society.Among them,data mining is an important means to analyze the information hidden in the data.Therefore,clustering analysis,which plays an important role in data mining,has increasingly become the focus of research.Clustering analysis has been widely used in information communication,intelligent manufacturing,bio-medicine,satellite remote sensing,image recognition,social security and other fields because of its efficient and simple characteristics.The existing clustering methods are divided into five categories,including basing on partition,hierarchy,density,grid and model.DBSCAN(Density-based spatial clustering of applications with noise)is a representative density-based clustering algorithm,which can identify clusters with any shapes and mark outliers for different data sets.However,the clustering precision of DBSCAN is easily affected by the parameter selection which depends extremely on the knowledge and experience of experts.In addition,DBSCAN has poor performance on uneven distributed datasets,because if there exist only several points density-reachable between two clusters,the two clusters will be wrongly grouped into one cluster.What’s more,DBSCAN requires multiple traversals of the entire data set to find the density-reachable points,which complicates the algorithm,particularly when the data set is high-dimensional and on the large scale.In order to solve the above problems,this paper proposes a novel three-stage clustering of improved DBSCAN algorithm TDBO(Three stages of improved DBSCAN algorithm).In the first stage,the probability strategy and queue data structure are used to merge the core-areas.In the second stage,other points that have not been grouped are assigned to corresponding initial clusters to obtain intermediate clusters,or they are identified as outliers.In the third stage,intermediate clusters are merged according to the relevant merging principle to obtain the final cluster.In addition to proposing three-stage clustering,the main contributions of this paper are as follows:(1)In order to accurately calculate the nearest distance between the two core-areas,a probability strategy is proposed.This strategy clips and divides the core-areas and randomly selects data points in the divided region,so as to obtain the nearest distance between the two core-areas and determine whether the two core-areas can be merged.(2)To solve the problem of multiple repeated traversal of data sets and the complexity and redundancy of the algorithm,a queue data structure is proposed to search all the core-areas that can be grouped into one cluster by performing queue out and queue in operations on the core points,so as to merge the relevant core-areas to obtain the initial clusters.(3)When the initial clusters are obtained,the unallocated data points are allocated to the initial cluster or marked as outliers to obtain the intermediate cluster.For whether the intermediate clusters can be merged,this paper proposes the relevant merging principle as the basis for merging,so that the algorithm can obtain better clustering results in the data set with uneven density distribution.(4)In order to avoid manual parameter setting,tedious and time-consuming trial and error parameter adjustment,the algorithm combines the black widow optimization algorithm and simulates the biological behavior of black widow spiders to determine the optimal parameters adaptively and improve the clustering accuracy.In order to verify the performance of the algorithm,the three-stage clustering of improved DBSCAN algorithm TDBO proposed in this paper is tested on the artificial data sets and the actual data sets,and the clustering results are compared with the classical and the latest clustering algorithms,which have achieved better clustering results on multiple evaluation indicators,not only improving the clustering accuracy,but also simplifying the algorithm.
Keywords/Search Tags:Density-based clustering, DBSCAN, Black Widom Optimization, uneven density distribution
PDF Full Text Request
Related items