Font Size: a A A

Research On K-MEANS Algorithm Based On GPU Parallel And Its Application In Text Clustering

Posted on:2019-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2348330542955580Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Since entering the era of big data,the Internet generates huge amounts of data every day,and most of the data is stored in the form of texts.Text mining is an important branch of data mining.It excavates text data through text clustering and mines out valuable information from massive data.At present,there are many kinds of clustering algorithms which can be applied to text clustering.K-means algorithm is a classic clustering algorithm,which has the advantages of fast convergence and easy implementation,so it is widely used in data mining.When the clustering data is too large or the data dimension is too high,the efficiency of the traditional K-means algorithm will be affected.Therefore,how to improve the speed of K-means algorithm becomes a new research hotspot.This thesis systematically studies the basic theory of text clustering and GPU programming,analyzes the shortcomings of the traditional K-means algorithm and designs a parallel GPU-based K-means algorithm that can improve the speed and accuracy of K-means algorithm.The idea of this algorithm is to put the steps of parallel computing in the K-means algorithm flow into the GPU and use CUDA parallel programming architecture to calculate the distance from each data point to the cluster center in the K-means algorithm instead of the traditional While the initial cluster centers are selected based on the initial distance values of the data points so as to avoid the problem that the traditional K-means algorithm randomly selects the cluster centers to cause the local optimal solution of clustering.Finally,a text clustering system based on parallel K-means algorithm and related tests are designed.The test results show that the GPUs using GeForce GTX860 M can achieve 9 to 16 times faster clustering compared with the CPU-based K-means algorithm for clustering different datasets based on GPU.The average clustering accuracy can increase 9%.In the text clustering system based on GPU parallel K-means algorithm can also improve the clustering speed and accuracy of the system,and has certain practical value.
Keywords/Search Tags:K-means algorithm, Text clustering, Data mining, Parallel computing
PDF Full Text Request
Related items