Font Size: a A A

Research On Multi Density Clustering Algorithm

Posted on:2019-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:L Z HanFull Text:PDF
GTID:2348330542981792Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the popularity of computers and the development of Internet of Things,the era of information big bang has arrived.Daily data is updating in TB,PB or even larger units.The growing data boost the data acquisition technology and data storage technology.Meanwhile,how to deal with these data and obtain useful information from it have brought great challenges to researchers.Data mining technology emerges as the times requireand has become a hot spot to more and more scholars.Data mining is the technology that acquires knowledge from large-scale data including noise and applies this information to life research and other fields.Clustering analysis is one of the important research directions in the field of data mining technology and has been widely used in the field of data mining.Through the research of the traditional clustering algorithm,there are many classical algorithms and improvements on them.The improvement includes the adaptability of the algorithm to the data,the time complexity of the algorithm,the weakening on sensitivity of the algorithm to parameter,the recognition of the noise points,the weakening about the algorithm's dependence on the prior knowledge and so on.But there are a lot of defeats in some aspects inevitably.Take density-based clustering algorithm as an example,although it can identify arbitrary shapes in noise-containing data,but the low efficiency limits its ability to process large-scale data,in addition,it processes multi density remains to be perfected.Another example is grid-based clustering algorithm which divides the data space into a limited cell grid.It treats the grid unit as a processing instead of data itself.However,under normal circumstances,the number of grids is much smaller than the number of data objects which decreases the processing objects.Speak more clearly,the advantage of this method is low time complexity and high efficiency in running time because of less objects that are independent of the input order of the data.However,the existing grid-based clustering algorithm can only get better clustering effect on the data set with uniform distribution.For the data unevenly distributed,there still have big room for improvement on the accuracy.In order to solve the above problems,the main research work of this paper is as follows:DBSCAN clustering algorithm,which is a classical density-based clustering algorithm,can identify clusters of arbitrary shapes in the data with noise.Because of the fixed Eps and Minpts,multi-density clustering is not ideal.To solve this problem,this paper proposes a method for multi density DBSCAN clustering algorithm.Firstly,the algorithm have a pretreatment to dataset,which let the object have an extra attribute.The attribute is used to record the density of the object in the neighborhood of a given radius and rank the dataset.Density threshold parameters suitable for this density are generated adaptively according to the density.The algorithm can effectively deal with the multi density data,and the data preprocessing technology can effectively avoid the influence of the data input order on the clustering results.In order to improve the clustering accuracy of extended clustering algorithm(ECRGDD)based on relative grid density difference,a new clustering algorithm based on dynamic relative density differences between grids called CDGRDD is proposed.This algorithm defines the initial cell grid density dynamically which can effectively improve ECRGDD's poor effect on cluster clustering data with large center density and sparse edge density.On the other hand,the distance judgment condition is added when the density is similar to the adjacent grid so as to reduce the blindness of grid merging.The results of experiments show that CDGRDD can effectively cluster multi-density and arbitrary shape data.Grid-partition can improve efficiency.Applying the idea to DBSCAN clustering algorithm,a DBSCAN clustering algorithm based on region division is proposed.The algorithm divides the data space into regions with different densities by using the relative density difference between grids,then automatically generates different Eps parameters according to the different density of each region,and finally uses DBSCAN algorithm to cluster every region.This idea makes the DBSCAN algorithm searches connected density only in the region avoiding the traversal of all data.It can effectively improve the precision of DBSCAN clustering algorithm even in the circumstance of many regions.
Keywords/Search Tags:multi-density clustering, cluster analysis, region division, ambiguity function
PDF Full Text Request
Related items