Font Size: a A A

Research Of Clustering Algorithm Based On Relative Density

Posted on:2013-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiFull Text:PDF
GTID:2218330371961567Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is a hot topic in the research field of data mining; many existed proposed clustering algorithms are suitable for different application environments according to their own different characteristics. Among them, the traditional density-based clustering algorithm has been widely used for its good scalability, noise immunity and the ability to discover clusters of arbitrary shape. However it adopts global density parameters, also called absolute density as the metrics of all clusters, so the algorithm has a vital weakness that it's lack of ability to find clusters of different density levels. Adopting the relative density as the metrics of clusters can be a good solution to this problem, so the relative density–based clustering algorithm has been presented.Meantime, the data which need to be clustered usually is dynamic in practical application, when the data changes, the original clustering pattern dug out should also be updated. Therefore, that how to design the incremental clustering algorithm for saving compute resources and improving clustering efficiency has become an important challenge of current cluster analysis.Firstly, the paper describes the related knowledge of clustering, and introduces the common used concepts of data mining such as the definition of clustering, similarity measurement, density, etc. The paper also conducts the review of traditional clustering algorithms, then gives their classification and performance comparison.Secondly, after analyzing the defects of traditional density-based clustering algorithm, a Relative Density-Based Clustering Algorithm for Mixture Data Sets(M_RDBCA) has been presented. The algorithm overcomes the defects of traditional density-based algorithm while maintaining its advantages which include the ability to find clusters of arbitrary shape and insensitive to the noise. Since the algorithm defines the distance of mixture data in the purpose of measuring similarity of mixed-attribute objects, and introduces the concept of pure neighbor, the algorithm is not only suitable for numerical data, but also can be applied to categorical data and mixture data. The algorithm can distinguish clusters of different density levels because it adopts relative density as clustering metrics, and it proposes a concept of pure core set of objects, so the objects in one cluster can be better integrated. In addition, the paper provides a theory for setting the parameter, so it avoids the problem of DBSCAN that the clustering results is too sensitive to algorithm parameters.At last, we performed a deep and comprehensive research on incremental clustering methods based on M_RDBCA in data warehouse environment. First, the paper described the data model and basic ideas of incremental clustering algorithm. Second, we discussed incremental clustering methods based on M_RDBCA of three different manipulation mode:①Clustering the affected set: after ascertaining the objects which can be affected by the insertion or deletion of an object, the incremental clustering algorithm only be performed on the affected set.②Single update mode: according to the impact on clustering caused by inserting or deleting an object, some clustering operation such as merge, split, or absorb will be conducted. An experiment is carried out for performance comparison, and performance speed-up chart is drew.②Batch update mode: briefly states the idea and method of incremental clustering in the batch mode which analyze the impact on clustering not only focus on existing objects in database but also consider the inserting and deleting objects in the update table.
Keywords/Search Tags:relative density, cluster analysis, mixture data, incremental clustering, single update mode, batch update mode
PDF Full Text Request
Related items