Research Of Natural Neighbor Based Density Clustering Algorithm And Its Parallelization

Posted on:2019-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2428330566977997

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is a kind of data processing technology which is based on the similarity between data objects.We can easily find that clustering analysis technique is widely applied to many academic fields,such as e-commerce,network security and so on.With the further study and exploration of this kind of technology,more and more algorithms have sprung up and cluster analysis technology has great development in recent decades.However,this technology still has great development space,such as,how to deal with the data sets with high dimensionality,and how to distinguish the clusters with various shapes.how to deal with noise point in data sets,how to deal with data sets which contains greater difference in density,how to obtain the number of categories of data sets effectively,and even how to evaluate the quality of a clustering result,and so on.There are many branches of clustering analysis technique.In particularly,clustering algorithm based on density can define the core point,boundary and density reachability to clusters the data set.This method can not only handle clusters with different shapes well,but also find out the noise points of data set accurately without predefining the number of clusters,and it has strongly interpretability.Because of these advantages,many scholars have devoted to the research of this kind of algorithm in recent years.However,with the in-depth research,we find that this kind of algorithm has many disadvantages.Take the one of the most classic algorithms-DBSCAN as example,first of all,this algorithm is high dependence on the input parameters,the selection of algorithm parameter has a significant effect on the clustering result.Secondly,by using the order of visiting core points to classify the boundary points is unexplainable.Finally,it can't deal with the data sets which contains greate difference in density.In this paper,a new density-based clustering algorithm(NN-DBSCAN)based on the natural neighbor algorithm is proposed.In this method,we process the dataset in advance by the natural neighbor algorithm,so that we can get the partial prior information to extract core points of data set and calculate the value of each data points' neighborhood radius.Obviously,there is not any input parameters in this method,and it also does a good job of dealing with the clusters with greate density differences.As the new algorithm modifies the definition of the direct density reachable in the DBSCAN algorithm to classify the boundary points more effectively.By analyzing the new algorithm's time complexity and the parallel framework which is widely used,we proposed a new parallel framework based on data and process to parallelize the natural neighbor algorithm.And the experimental results show that the NN-DBSCAN algorithm is better than the DBSCAN algorithm in many data sets and the new parallel framework which speed up natural neighbor algorithm works more efficiently than Spark.

Keywords/Search Tags:

Clustering, natural neighbor, density, Core point, DBSCAN

PDF Full Text Request

Related items

1	Research And Application Of Density Peak Clustering Algorithm Based On Natural Neighbors And Representative Points
2	Research On Density Clustering Algorithm Based On Reference Points
3	Optimization Research Of Density Peaks Clustering Algorithm Based On Neighbor Searching
4	Research On Adaptive Density Peaks Clustering Algorithm Based On Natural Neighbor
5	Study On Clustering Algorithm Based On Density Core
6	Research On Fast Density Clustering Algorithm Based On Nearest Neighbor Query Technology
7	Study On Cluster Analysis And Outlier Detection Based On Natural Neighbor And Local Resultant Force
8	Research On Density Clustering Algorithm Based On DBSCAN For Personalized Clustering
9	Study On Cluster Analysis And Outlier Detection Based On Natural Neighbor And Density Core
10	Research And Application Of Clustering Algorithm Based On DBSCAN