Research On Clustering Algorithm Based On Relative Entropy And Grid-density Filter

Posted on:2014-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:D Wang

Full Text:PDF

GTID:2268330392964470

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of the important branch in the field of data mining. The research of clustering algorithm has theoretical significance and application value. Traditional relative clustering algorithm based on density and grid is depend on the input parameters. The accuracy of clustering result is not high, and dimension effect will be easily emerged when clustering the high dimentional data sets. In response to these problems, the paper focuses on the grid and density based clustering algorithm with relative entropy and subspace clustering algorithm base on grid and density filter.Firstly, in order to reduce the denpendency on the input parameters and improve the clustering accuracy, this paper presents a grid and density based clustering algorithm with relative entropy. The algorithm defines concept of grid relative entropy. The density threshold is given by calculating the minimum grid relative entropy. So users do not need to input inappropriate density threshold that can lead to deviation of the clustering results. Further, for the data points of spase grid, the algorithm marks their membership degree by calculating the Euclidean distance to the centroid of the adjacent and dense grid. So the algorithm has improved the accuracy of clustering results by extracting boundary points.Secondly, in order to solve the dimension effect when clustering the high dimensional datasets, this paper presents a subspace clustering algorithm based on grid and density filter. The core of this algorithm is two layers of filtering structure. In the hypercube grid filter, the traditional grid structure has been improved. Hypercube grid is constructed by merging adjacent grid while checking the boundry. So we avoid to exponential search of all subspaces, and it effectively improves the efficiency of the algorithm. It will delete some hypercube grids that can not meet the threshold. So the algorithm can reduce the cluster candidate sets. In the minimum density threshold filter, we define the concept of variable density threshold in order to adapt the high dimensional data sets. Then we calculate the minimum density threshold and filter the results of hypercube grid filter by density monotonicity principle. It ensures the integrity of clustering results while pruning the noise data. Lastly, the above algorithms are implemented with Java language. Synthetic datasets and real datasets are adopted for data mining in the experiments.

Keywords/Search Tags:

clustering, density, grid, relative entropy, two-level filter

PDF Full Text Request

Related items

1	Research On Optimization Of Adaptive Density Partition Clustering Algorithm
2	Grid-based And Information Entropy-based Clustering Algorithm
3	Non-uniform Data Clustering Method Based On Relative Density
4	Parameter Grid Clustering Algorithm
5	Research And Improvement Of The Clustering Algorithm Based On Sparsity Score Entropy And Density Entropy
6	Research On An Effective Self Adapted Grid-Density Based Clustering
7	Research On Image Segmentation Methods Based On Density Peak Clustering
8	Research Of Clustering Algorithm Based On Relative Density
9	Grid Clustering Research Based On Impact Factor Of Grid Density
10	Application Of Grid And Density Based Clustering Algorithm In Data Mining