Research On Partition-Based Efficient Clustering Algorithm For Large-Scale Data

Posted on:2011-05-22

Degree:Master

Type:Thesis

Country:China

Candidate:L L Meng

Full Text:PDF

GTID:2178360302494484

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As one of partitioning-based clustering algorithm, the k-means algorithm is widely used in various aspects. However, these algorithms can not cluster effectively on large databases. This paper has mainly concerned on how to incorporate grid-based methods constraints in the k-means algorithm, how to solve some problems in grid method, how to improve k-modes clustering algorithm and how to apply it to software security detecting. The research on these problems is important with broad applications, including pattern recognition, data analysis, market research and other cluster analysis of related processing, and so on.First of all, this paper presents an improved clustering algorithm based on grid-density, CABGD. In CABGD, we put forward a concept of Center Intensity of Grid Cell, CIGC. The distribution of data in some grid is identified by calculating the value of CIGC of the grid. CABGD solves the problem of low clustering accuracy that the division of grid by human leads to in the traditional grid-based algorithm. The results show that CABGD has higher accuracy than traditional grid-based clustering algorithm.Secondly, an improved k-means clustering algorithm based on grid, IKMG, is presented. Every single grid is regard as a basic processing unit in IKMG. First, the algorithm further improve the concept of CIGC from second chapter, and design a new data structure—clustering tree. The clustering tree is used to organize and adjust the cluster. The process of clustering is the process of constructing and merging the tree. Finally the clustering results is k cluster trees. For large-scale data, the algorithm has better time efficiency than traditional k-means algorithm, can detect clusters of arbitrary shapes and sizes and do not have to specify the k value at the beginning.Finally, the paper designs a data structure—similar feature tree, SFT. As a software fault detection tool, SFT is mainly used to improve software security detection efficiently. In order to construct a SFT, an improved k-modes clustering algorithm, IKMD, is presented. In IKMD, the choice of initial modes is carried out in the entire clustering process. The clustering results are given in the form of k trees. Finally, the forests containing k trees are conversed to a binary tree , that is the SFT. According to the character of apriori, SFT detect fault of software program by the principles of the first left and then right. The results showed that the method is feasible and effective.

Keywords/Search Tags:

partitioning clustering, clustering, k-means, grid, k-modes

PDF Full Text Request

Related items

1	Partitioning Clustering Method Based On Geometric Features
2	A Deep Embedding Clustering Algorithm Considering Preservation Of Initial Clustering Structure And Its Application
3	Probabilistic K-means Models Via Nonlinear Programming
4	Research And Application Of Cluster Algorithm Based On Partitioning Method
5	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
6	Research On K-modes Clustering Algorithm Of Dissimilarity Measure
7	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
8	Research On Improvement Of K-means Clustering Algorithm
9	Research On Ensemble-Initialized K-Means Clustering Algorithms
10	Cluster Study Based On Functional Magnetic Resonance Imaging Data