Font Size: a A A

Research On Balanced Clustering Algorithm Based On CPU-GPU Collaborative Computing

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:H B TangFull Text:PDF
GTID:2428330647461954Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering is an important data analysis technology,which aims to divide the dataset into different clusters,and the data objects in the same cluster have high similarity.After many years of research,many clustering algorithms with good performance have been proposed,including partition clustering,hierarchical clustering,density clustering,fuzzy clustering and so on.Clustering also has a wide range of application scenarios,including machine learning,data mining,pattern recognition,information retrieval and so on.Some scenarios require balanced clustering results.However,traditional clustering algorithms usually do not restrict the difference of cluster size between any two clusters in the clustering results.Thus,the balance of the clustering result is often poor,even in the case of producing too large or too small data clusters.Therefore,to ensure the quality of clustering results and further constrain the balance of clusters has important research and application value.In this paper,we focus on the research of balanced constraint clustering algorithm and its optimization.(1)A balanced clustering algorithm based on simulated annealing and greedy strategy is proposed.In order to improve the clustering effectiveness and time performance of the existing balanced clustering algorithm,a low time complexity and efficient balanced clustering algorithm is proposed.In this algorithm,the simulated annealing strategy is used to quickly find the appropriate initial point of balanced clustering in the dataset.As an approximate algorithm,simulated annealing can obtain a solution for a NP-hard problem in a short time by replacing the optimal solution with a high quality solution.Then,based on the initial point selected by simulated annealing,the algorithm uses greedy strategy to form K clusters,which has low time complexity.However,this algorithm can only produce absolutely balanced clustering results,and it is difficult to generate relatively balanced clustering results,especially with user-defined balance.It cannot accurately constrain the maximum value of the size difference between any two clusters in the clustering results to be less than or equal to a certain fixed value.(2)A K-means based ?-balanced clustering algorithm was proposed.In view of the drawback of the existing balanced constraint clustering algorithm,a ?-balanced clustering algorithm is proposed,which can precisely constrain the size difference of any two clusters in the clustering results,where ? represents the maximum difference between any two cluster sizes,which supports user-defined.Then,the algorithm designs a two-layer filter to filter the unnecessary calculation in the data object allocation step,and also optimizes the centroid update step.The experimental results show that ?-balanced clustering algorithm has better performance and time efficiency,and the relevant optimization mechanism further improves the efficiency of the algorithm.(3)Acceleration of balanced clustering algorithm on CPU-GPU platform.With the continuous development of hardware technology,GPU techology has gradually developed from the field of professional graphics computing to the field of general computing.Due to difference of their work principle of the bottom layer,GPU is good at dealing with the computing tasks with repetitive,simple and large-scale characteristics,which is more efficient than the efficiency under CPU.Therefore,it is usually more efficient to coordinate CPU and GPU to process computing tasks at the same time.In the existing work of balance constrain clustering algorithm,there is no optimization work for CPU and GPU heterogeneous computing.In this paper,the basic algorithm proposed in point(2)and its related optimized versions are improved on CPU-GPU heterogeneous platforms.Experimental results show that the efficiency of the algorithm has been greatly improved.
Keywords/Search Tags:Balance Clustering, GPU-CPU Heterogeneous Computing, Parallelization, Data Management
PDF Full Text Request
Related items