Font Size: a A A

Research On Clusrering Algorithm Of High Dimensional Data

Posted on:2020-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:C Q NiFull Text:PDF
GTID:2428330590994847Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In the field of data mining,we always use machine learning algorithms to obtain valuable information from existing data sets,one of the important tools of them is clustering analysis,which is an unsupervised learning method.In recent years,with the rapid development of computers technology,the amount of data which needs to be processed has become larger and larger.At present,low-dimensional clustering algorithms are relatively mature,but due to the ‘effect of dimension',classical clustering algorithms often fail when they are applied to high dimensional data.Therefore,clustering analysis for high dimensional data has also become a very active research field.CLIQUE algorithm,which belongs to subspace clustering algorithm,is one of the most important and widely used clustering algorithms.On the basis of understanding the CLIQUE clustering algorithm,this paper summarizes its advantages and limitations,and improves the aspects of its fixed divided grids and user specified input parameters.It uses dynamic entropy to dynamic divided grids.In this paper,the definitions of the grid parameter and density threshold are proposed which aim to reduce the influence of human input and reduce the dependence of the algorithm for the user's prior knowledge.Finally,the improved algorithm is simulated by three selecting sets of actual data in the UCI dataset and compared with the original algorithm and other classical algorithms.We use four classical clustering evaluation indexes(Precision,F-measure,RI,ARI)to analyze the clustering algorithms.The experimental results show that,the improved CLIQUE clustering algorithm is effective for large-dimensional data clustering,which can reduce the influence of ‘effect of dimension' to some extent.The improved algorithm does not require users to provide grid parameters and density thresholds,which can avoid the trouble caused by artificial selection of parameters,and has a significant improvement in clustering efficiency.The improved CLIQUE algorithm is better than the original algorithms,and has value and significance in the practical application of clustering analysis of high dimensional data.
Keywords/Search Tags:high dimensional data, clustering algorithm, CLIQUE, dynamic divided grid, grid parameter, density threshold
PDF Full Text Request
Related items