Font Size: a A A

Multidimensional Data Clustering Algorithm Research And GPU Acceleration Based Global K-means

Posted on:2013-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z N HuiFull Text:PDF
GTID:2248330395457315Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the production and people’s life more and more frequent more and more complex, the amount of data high speed increases, the data mining technology in the people’s life plays a more and more important role. And the clustering analysis as a data mining technology is an important part to analysis of the phenomenon of all kinds of data. this chapter mainly on multidimensional data clustering algorithm are analysed, put forward two kinds of clustering algorithm for multidimensional data. At the same time in order to solve the problem of massive data processing time, a correlation algorithm GPU accelerated research. The main research work is described as follows:In multidimensional data clustering process, each data category in the dimension plays different roles to each cluster, this paper presents a method based on the global weights attribute K-means algorithm, namely, Global weighted K-means (GWKM algorithm). GWKM algorithm combines with attribute weights of LAW K-means algorithm and Global K-means clustering algorithm framework, and the GKM algorithm is for each clustering center selection process, and the introduction of LAW K-means algorithm, not only obtained the clustering centers, and identified by clustering attribute weights, finally obtained good experimental results.However, when some data, the dilution of its dimensions is larger, will give the multidimensional data clustering brings a big problem. In order to effectively solve the data sparseness caused by the clustering problem, this paper proposes a new algorithm based on entropy weight global K-means, namely Global Entropy weighted K-means (GEWKM algorithm). GEWKM algorithm combined with Entropy Weighting K-means algorithm entropy weight and Global K-means clustering algorithm framework, the GKM algorithm is for each clustering center selection process, and the introduction of Entropy Weighting K-means algorithm use more reasonable entropy weighting to calculate the attribute weights, and better experimental results are obtained. Experimental results show that, the proposed algorithm is stable, can effectively solve the data sparseness problem brought by multidimensional data clustering.As a result of the proposed GWKM algorithm and GEWKM algorithm is based on the Global K-means algorithm clustering framework, but Global K-means algorithm computational complexity is great, and its application is restricted in mass data, so in order to solve great computation complexity of Global K-means algorithm, and satisfy the processing mining mass data time requirement, this paper puts forward based on the GPU parallel Global K-means algorithm, PGKM_Mix algorithm, which parallel the most time-consuming cluster centers selected. in order to more fully mining PGKM_Mix algorithm data parallelism, further put forward the PGKM_IRG algorithm, which introduces irregular reduction method to parallel clustering center update. And it mainly discusses the GPU achieved in these two kinds of parallel algorithm design and operation details. Through the artificial data sets and UCI data set on the experimental verification, through the artificial data sets and UCI data set on the experiments, the proposed parallel algorithm without affecting the performance achieved high acceleration rate, and show the proposed parallel algorithm validity.This work was supported by the Basic Research Operating Expenses Fund Project of Xidian University (Grant No.JY10000902033).
Keywords/Search Tags:data mining, clustering analysis, Global K-means, multidimension data clustering GPU acceleration, OpenCL
PDF Full Text Request
Related items