Research On Density Clustering Algorithm Based On Sampling And Grouping

Posted on:2021-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Zheng

Full Text:PDF

GTID:2518306107993639

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The amount of data generated in the era of big data is increasing exponentially,and there is great value hidden in these data.It is difficult to find interested information from massive data.But data mining can find the information accurately and efficiently.As a common technology in data mining,cluster analysis can play a good role in information mining when there is no prior knowledge.Density based clustering analysis is an important method in clustering analysis.It can detect arbitrary shape clustering and deal with outliers,and does not need to determine the number of clusters in advance.The common clustering algorithm based on density is DBSCAN algorithm,which has the characteristics of clustering analysis and can produce good results for noisy data sets.Although DBSCAN is an attractive solution to many problems,like other clustering algorithms,due to its high time complexity,it can not well adapt to large data sets.Therefore,this thesis proposes improvements from two aspects: reducing the size of dbscn sampling data set and reducing its time complexity.First,from the perspective of reducing the size of the sample data set,this thesis proposes two methods to generate the sample data to reduce the size of the data set.The first method is to improve the leader algorithm in Rough-DBSCAN,so as to provide a simple method to get better results,which is called Leader* algorithm,and then the improved DBSCAN algorithm is called Rough*-DBSCAN.The second method is called I-DBSCAN,which is a new heuristic algorithm.It extracts samples and elements from the intersection of clusters found by Leader* algorithm,which can adapt to all data sets and produce good results without any additional parameters.Secondly,in order to reduce its time complexity,this thesis proposes a group method named Groups-DBSCAN to speed up neighborhood search queries.The groups method builds a graph based index structure on the data.Different from the traditional hierarchical index structure such as R-tree,this method is suitable for high-dimensional data sets.In addition,Groups-DBSCAN is effective when dealing with large amounts of noise,and does not degrade the performance of DBSCAN.And the clustering result of Groups-DBSCAN is the same as that of DBSCAN,but its running time is shortened.

Keywords/Search Tags:

Clustering, Density, Sample, Grouping

PDF Full Text Request

Related items

1	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering
2	Research On Active Learning Method Based On Density Clustering And Its Application
3	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
4	Sample Selection Algorithms Based On Sample Entropy And Pre-clustering
5	Research On Density Peaks Clustering
6	Research Of Density-based Clustering Algorithm By KNN
7	Research And Improvement On Density-Based Clustering Algorithm
8	Multi-Improvement On Density-Based Clustering Algorithm And Its Applications
9	The Research And Improvement Of Density-based Clustering Algorithm
10	The Research And Application Of Density Peaks Clustering