Font Size: a A A

Research On Density Clustering Algorithm Based On Sampling And Grouping

Posted on:2021-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhengFull Text:PDF
GTID:2518306107993639Subject:Engineering
Abstract/Summary:PDF Full Text Request
The amount of data generated in the era of big data is increasing exponentially,and there is great value hidden in these data.It is difficult to find interested information from massive data.But data mining can find the information accurately and efficiently.As a common technology in data mining,cluster analysis can play a good role in information mining when there is no prior knowledge.Density based clustering analysis is an important method in clustering analysis.It can detect arbitrary shape clustering and deal with outliers,and does not need to determine the number of clusters in advance.The common clustering algorithm based on density is DBSCAN algorithm,which has the characteristics of clustering analysis and can produce good results for noisy data sets.Although DBSCAN is an attractive solution to many problems,like other clustering algorithms,due to its high time complexity,it can not well adapt to large data sets.Therefore,this thesis proposes improvements from two aspects: reducing the size of dbscn sampling data set and reducing its time complexity.First,from the perspective of reducing the size of the sample data set,this thesis proposes two methods to generate the sample data to reduce the size of the data set.The first method is to improve the leader algorithm in Rough-DBSCAN,so as to provide a simple method to get better results,which is called Leader* algorithm,and then the improved DBSCAN algorithm is called Rough*-DBSCAN.The second method is called I-DBSCAN,which is a new heuristic algorithm.It extracts samples and elements from the intersection of clusters found by Leader* algorithm,which can adapt to all data sets and produce good results without any additional parameters.Secondly,in order to reduce its time complexity,this thesis proposes a group method named Groups-DBSCAN to speed up neighborhood search queries.The groups method builds a graph based index structure on the data.Different from the traditional hierarchical index structure such as R-tree,this method is suitable for high-dimensional data sets.In addition,Groups-DBSCAN is effective when dealing with large amounts of noise,and does not degrade the performance of DBSCAN.And the clustering result of Groups-DBSCAN is the same as that of DBSCAN,but its running time is shortened.
Keywords/Search Tags:Clustering, Density, Sample, Grouping
PDF Full Text Request
Related items