DBSCAN Algorithm Based On Filtration For Datasets With Varied Densities

Posted on:2010-08-15

Degree:Master

Type:Thesis

Country:China

Candidate:L M Wu

Full Text:PDF

GTID:2178360275974457

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Data Mining finds out connotative, unknown and potentially valuable knowledge and rules. Clustering is one of the important research fields in data mining. Clustering is the process of grouping physical or abstract sets into several similar clusters. The clusters produced by clustering are sets of data objects. One object is similar to the other objects in the same cluster, and is different from the objects in different clusters. In many applications, the objects in the same cluster can be treated as a whole. When analyzing a big, complicated, continuous data base or totally unknown structures, clustering is a very useful tool.At present, clustering analysis algorithm can be sorted into several kinds: partition method, hierarchy method, density based method, gridding based method and model based method. DBSCAN algorithm is a typical density based method. The merits of DBSCAN are that it can finds out arbitrary shape clusters, and its clustering result is hardly influenced by noise points. The short points of DBSCAN are listed as follows: First, if the values of these global variables are not appropriate, then clustering result could be influenced. Second, when the distribution of datasets is uneven, the clustering quality is very poor.For the disadvantages of DBSCAN, the DBSCAN algorithm based on filtration algorithm is proposed. At the same time, the algorithm also could decrease the region query. First, the algorithm calculates the k-dist of the datasets, and then uses 1-dimension clustering to get all the clusters. After the noise clusters are removed, the clusters that can represent the primary densities could be obtained. Second, the improved algorithm gets several values of parameter Eps for different densities, and provides for the filtration.After got parameter Epsi, the improved algorithm uses different values of Epsi to cluster the datasets, and then find out clusters for the datasets. For the next process, the points that have been clustered are ignored, which avoids marking both denser areas and sparser ones as one cluster.The improved algorithm uses 1-dimension clustering to get Epsi, and then uses different values of Epsi to cluster. So, when the distribution of datasets is uneven, the clustering results of improved algorithm are better to reflect the distribution of datasets.

Keywords/Search Tags:

Data Mining, Clustering, Varied Densities, DBSCAN, Filtration

PDF Full Text Request

Related items

1	Research On Adaptive Varied Density Clustering Algorithm Based On DBSCAN
2	Research On Density Clustering Algorithm Based On DBSCAN For Personalized Clustering
3	A Self-adaptive Density-based Clustering Algorithm For Discovering Density Varied Clusters
4	Research And Implementation Of Distributed Data Mining Model Based On DBSCAN
5	Research On Adaptive Clustering Algorithm Based On DBSCAN Theory
6	Construct Of J2EE-Based Data Mining System And Research On Clustering Technology
7	Research On Parallel Optimization Of Clustering Algorithms In Data Mining
8	Research On Parallization Of DBSCAN Clustering Algorithm For Spatial Data Mining Based On Spark Platform
9	Data Mining, Cluster Analysis Algorithm Research And Application
10	The Study On The Clustering Algorithms