Font Size: a A A

Research On Clustering Algorithm Based On Relative Entropy And Grid-density Filter

Posted on:2014-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2268330392964470Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the important branch in the field of data mining. The research of clustering algorithm has theoretical significance and application value. Traditional relative clustering algorithm based on density and grid is depend on the input parameters. The accuracy of clustering result is not high, and dimension effect will be easily emerged when clustering the high dimentional data sets. In response to these problems, the paper focuses on the grid and density based clustering algorithm with relative entropy and subspace clustering algorithm base on grid and density filter.Firstly, in order to reduce the denpendency on the input parameters and improve the clustering accuracy, this paper presents a grid and density based clustering algorithm with relative entropy. The algorithm defines concept of grid relative entropy. The density threshold is given by calculating the minimum grid relative entropy. So users do not need to input inappropriate density threshold that can lead to deviation of the clustering results. Further, for the data points of spase grid, the algorithm marks their membership degree by calculating the Euclidean distance to the centroid of the adjacent and dense grid. So the algorithm has improved the accuracy of clustering results by extracting boundary points.Secondly, in order to solve the dimension effect when clustering the high dimensional datasets, this paper presents a subspace clustering algorithm based on grid and density filter. The core of this algorithm is two layers of filtering structure. In the hypercube grid filter, the traditional grid structure has been improved. Hypercube grid is constructed by merging adjacent grid while checking the boundry. So we avoid to exponential search of all subspaces, and it effectively improves the efficiency of the algorithm. It will delete some hypercube grids that can not meet the threshold. So the algorithm can reduce the cluster candidate sets. In the minimum density threshold filter, we define the concept of variable density threshold in order to adapt the high dimensional data sets. Then we calculate the minimum density threshold and filter the results of hypercube grid filter by density monotonicity principle. It ensures the integrity of clustering results while pruning the noise data. Lastly, the above algorithms are implemented with Java language. Synthetic datasets and real datasets are adopted for data mining in the experiments.
Keywords/Search Tags:clustering, density, grid, relative entropy, two-level filter
PDF Full Text Request
Related items