Font Size: a A A

The Improvement Research And Implementation Of Dbscan Algorithm Based On Information Entropy And Data Partition

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z H CaiFull Text:PDF
GTID:2428330611465574Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The scale of data analysis has increased dramatically and the data distribution has become more and more complex with the development of science and technology.Density-based clustering algorithms in high-dimension and complex data distribution datasets are requested to ensure higher performance.Density-based Spatial Clustering of Applications with Noise(DBSCAN),a famous density-based clustering algorithm,is difficult to distinguish the condition that different objective clusters are “connected” by several data points with strong internal correlation(like lines).DBSCAN tends to identify these objective clusters as a same cluster,which is called a linear connection problem.Besides,fixed parameters cannot meet the needs of high-dimensional heterogeneous clustering.In order to improve the applicability of density-based clustering algorithms,this paper proposes a novel algorithm,called Density-based Spatial Clustering with Self-adaptive Recognition and Adjustment(SRA-DBSCAN)which consists of a data block splitter module,a local clustering module,a global clustering module and a data block merger to obtain adaptive clustering results.SRA-DBSCAN first uses the data block splitter to identify the threshold and dimension to partition based on information entropy adaptively from top to down.It partitions the original dataset to obtain a set of data blocks,and then SRA-DBSCAN executes the local clustering module which has a two-stage adaptive parameter adjustment capability.Finally,SRA-DBSCAN performs a recursive method to backtrack the data block split tree by using global clustering module from bottom to up.The data block merger can merge multiple local clusters related in different data blocks during the global clustering process.Experimental results on artificial dataset which has complex data distribution with multiple types of linear connection problem as well as fixed parameter problem and experimental results on multiple real-world datasets show that SRA-DBSCAN proposed in this paper has a complete adaptive recognition and adjustment capability.It can identify the obvious boundaries or the weak connections(linear connections)between different objectiveclusters and it also adjusts parameter adaptively within different data blocks.SRA-DBSCAN reduces the sensitivity to input parameters and solves multiple types of linear connection problem as well as fixed parameter problem,showing higher performance than other baselines in the comprehensive evaluation of accuracy,precision,recall and F1-score.It is suitable for data analysis requirements of density-based clustering in high-dimension and complex data distribution datasets.
Keywords/Search Tags:Density-based clustering, Data partition, Information entropy, Linear connection, Parameter adjustment
PDF Full Text Request
Related items