Font Size: a A A

Clustering Algorithms Analysis On Data Dimension

Posted on:2008-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:1118360245492669Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this paper, the characters and requirements of clustering algorithms on different data dimensions are analyzed. We give out our resolutions and algorithms for low dimensional clustering, high dimensional clustering, and subspace clustering.We design an algorithm called GDMS to resolve low dimensional clustering. A detecting window is introduced to find out the distributions of data sets. A filter function framework is introduced to distinguish clusters of different density or attributes. Different movement styles and their combinations are designed to speedup algorithm with high accuracy. A new mathematics morphological operator is introduced to discover clusters, which is more accurate than the ordinary operators: open and close. The scale space theory is integrated with mathematics morphology to get multi-scale and hierarchical clusters. In addition, GDMS is extended to support obstacle clustering. The advantages of GDMS are: it is very efficient with a complexity of O(N); it is effective in discovering clusters of arbitrary shape; it can distinguish clusters of different density or attributes; it is not sensitive to noise; it is not sensitive to the grid size; its hierarchical result is easy to understand and interpret.We design three algorithms MDCLUS, IMDCLUS, and PMDCLUS to speedup high dimensional clustering. Monte Carlo method is adopted for acquiring core objects to reduce computation complexity. The estimation of minimum sample ratio is analyzed quantitatively to avoid clusters' losing and breaking. A label hash method is introduced to get a linear computation complexity with object's number for clusters emerging. We realized increment clustering and distributed parallel clustering. The advantages of our approach are: it is effective in discovering clusters of arbitrary shape; it is not sensitive to noise; the clustering complexity is linear with data dimensions; it is faster than DBSCAN algorithm.We design active spaces and grid algorithms for subspace clustering. We prove the downward closure properties of density, connection, and coverage. A top-down search method is introduced to reduce searching spaces. A new filter method based on active axis numbers is introduced to filter noise objects. The algorithm based on fixed grid is extended to adaptive grid for efficiency. A distributed parallel algorithm is realized for large and high dimensional data sets. We introduce a hierarchical method to arrange clustering result. The advantages of our approach are: it can discover clusters both on entire space and subspace; the computation complexity is proximate linear with object's number, space dimension, and clusters' dimension respectively; it is not sensitive to noise; the clustering result is easy to understand and interpret; it can find both disjoint clusters or overlap clusters.
Keywords/Search Tags:Incremental clustering, Parallel clustering, Subspace clustering, Mathematics morphology, Scale space
PDF Full Text Request
Related items