Font Size: a A A

DBSCAN Algorithm Based On Filtration For Datasets With Varied Densities

Posted on:2010-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:L M WuFull Text:PDF
GTID:2178360275974457Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data Mining finds out connotative, unknown and potentially valuable knowledge and rules. Clustering is one of the important research fields in data mining. Clustering is the process of grouping physical or abstract sets into several similar clusters. The clusters produced by clustering are sets of data objects. One object is similar to the other objects in the same cluster, and is different from the objects in different clusters. In many applications, the objects in the same cluster can be treated as a whole. When analyzing a big, complicated, continuous data base or totally unknown structures, clustering is a very useful tool.At present, clustering analysis algorithm can be sorted into several kinds: partition method, hierarchy method, density based method, gridding based method and model based method. DBSCAN algorithm is a typical density based method. The merits of DBSCAN are that it can finds out arbitrary shape clusters, and its clustering result is hardly influenced by noise points. The short points of DBSCAN are listed as follows: First, if the values of these global variables are not appropriate, then clustering result could be influenced. Second, when the distribution of datasets is uneven, the clustering quality is very poor.For the disadvantages of DBSCAN, the DBSCAN algorithm based on filtration algorithm is proposed. At the same time, the algorithm also could decrease the region query. First, the algorithm calculates the k-dist of the datasets, and then uses 1-dimension clustering to get all the clusters. After the noise clusters are removed, the clusters that can represent the primary densities could be obtained. Second, the improved algorithm gets several values of parameter Eps for different densities, and provides for the filtration.After got parameter Epsi, the improved algorithm uses different values of Epsi to cluster the datasets, and then find out clusters for the datasets. For the next process, the points that have been clustered are ignored, which avoids marking both denser areas and sparser ones as one cluster.The improved algorithm uses 1-dimension clustering to get Epsi, and then uses different values of Epsi to cluster. So, when the distribution of datasets is uneven, the clustering results of improved algorithm are better to reflect the distribution of datasets.
Keywords/Search Tags:Data Mining, Clustering, Varied Densities, DBSCAN, Filtration
PDF Full Text Request
Related items