Font Size: a A A

Research On DBSCAN Algorithm Based On Grid And Density-ratio

Posted on:2019-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:R PuFull Text:PDF
GTID:2428330545454768Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of science and technology,the volume of data is expanding.How to effectively analyze these massive data has become a hot and difficult point in the present research.Clustering analysis has been widely used in bioinformatics,computer vision,text categorization,and so on.As a typical density based clustering algorithm,DBSCAN algorithm is widely used because it can identify clusters of arbitrary shape,and can effectively identify the characteristics of noise points.However,there are some problems in the algorithm itself.Because it uses a single threshold defined as MinPts to divide all the clusters.When the data distribution is not uniform,the DBSCAN algorithm has a poor clustering effect.On the other hand,the existing incremental clustering method is difficult to meet the needs of incremental processing.In view of these problems,the following research work has been carried out in this paper.(1)In this paper,the clustering conditions based density were deeply analyzed,and a DBSCAN clustering algorithm based on grid and density ratio was proposed.Firstly,the data space was divided into multi-resolution grid,the data was divided into multiple grid spaces.Using the meshed grid,the grid data space cluster was quickly to find peak value and low valley,that is,the set of maximum and minimum of grid space.Then it used density estimation to calculate density in density ratio clustering algorithm,so as to achieve the purpose that using the algorithm determine the clustering density ratio threshold quickly and self-adaption and dothe density ratio threshold of DBSCAN clustering with.At last,we tested the influence of different neighborhood ratio on the clustering results,tested the time complexity and clustering effect of this paper,and compared the algorithm with the DBSCAN algorithm and the DPC algorithm respectively.The simulation experiment showed that the clustering accuracy of the traditional DBSCAN method was significantly improved on the premise that the time complexity increases little when the time complexity was not large.(2)The data studied in the era of big data are not always unchanged.More often than not,people are faced with constantly increasing data.Aiming at the efficiency of the current clustering algorithm in incremental clustering process,an incremental clustering method based on grid division was proposed.When a small new set of data sets was added to the original data set,the algorithm could only cluster the incremental data,and then brought the clustering results into the initial clustering results according to the matched grid information,and got the final clustering results.Finally,the proposed method was compared with the traditional DBSCAN algorithm in terms of time complexity and clustering effect,and the feasibility and efficiency of the proposed method were tested.Experiments showed that batch incremental processing of new data objects could be rapidly implemented in the case of loss clustering accuracy.
Keywords/Search Tags:grid, density ratio, DBSCAN, incremental clustering
PDF Full Text Request
Related items