Font Size: a A A

Research Of Clustering Algorithm Based On Density

Posted on:2010-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:L Y SunFull Text:PDF
GTID:2178360275985418Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of information technology, Data Mining has been paid attention extensively. As we know, Data Mining has a large research scope, Cluster analysis is one of important research subject in it. The goal of clustering is to partition data set into such clusters that intra-cluster data are similar and inter-cluster data are dissimilar without any prior knowledge, which is very different from data classification. So clustering is also known as "unsupervised classification". Cluster analysis can be used not only as a separate technique to discover the information about data distribution, but also as the preprocessing of other data mining operations, therefore it is very meaningful to research how to boost the performance of clustering algorithms. A lot of clustering algorithms are presented so far, such as hierarchical methods, partition methods, grid-based methods, density-based methods, model-based methods and so on.Density-based clustering algorithms can discover arbitrary shaped cluster, identify noise, which have been applied in various fields weightily. DBSCAN algorithm is the typical density-based cluster algorithm. But it deals with part of the small density-based cluster as noisy data because the algorithm adopts global density; however, the points in the boundary of the two clusters whose density is bigger is easy to cause single connectivity and error results. At the same time, the algorithm needs to decide every point in the database is core point or not and sets up inquire region for them, so it needs frequent I/O operation.FDBSCAN algorithm is an improvement algorithm to DBSCAN algorithm. The algorithm expands cluster by choosing some points from the fields of core points to decrease inquire region and I/O operation and to speed up the rate to a certain extent. But it's easy to lose a part of the objects in the process of clustering and become noise, impact the results of clustering. The third chapter proposed a method that chooses a core object with the furthest distance, bounding to conduct an in-depth research to aim at the existent problem of the FDBSCAN algorithm, and made a careful discussion to aim that it doesn't inquire when the core object isn't core point leading to lose the objects, in the end, it proposes a method that choice representative objects from core points in the core fields, resolves the problem of loss points to a certain extent.FDBSCAN algorithm is an improvement algorithm to DBSCAN algorithm in the speed, and RDBClustering makes an improvement in the global density. The two algorithms are improved from the different points of view, but there are still some deficiencies. The former speeds up the rate in a certain extent, but the problem can't be solved when the density is inhomogeneous; the latter solvers the problem of global density, but the speed is slow and the memory required is also large. As to these factors, the forth chapter proposed a new algorithm----a fast relative density-based clustering algorithm (FRDBClustering), which combines the advantages of both algorithm, not only solves the global parameter in the DBSCAN algorithm but also speeds up the rate in a certain extent to the DBSCAN algorithm, furthermore, the simulation experiment indicates the validity of the method.
Keywords/Search Tags:Cluster Analysis, fast algorithm, core point, representative objects, relative density
PDF Full Text Request
Related items