Font Size: a A A

GPU-Based Design And Implement Of Data Mining Classification Algorithm

Posted on:2015-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y W BaiFull Text:PDF
GTID:2298330467463186Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, computer storage technology, database technology continues to improve, people have accumulated vast amounts of data, and furthermore, the speed of accumulating data is accelerating. Usually we placed the various types of data mining algorithms on distributed systems consisting of multiple computers to accelerate the data mining process. In a single computer, Multi-threaded parallel computing must be done by CPU. GPU is a high performance computing features multicore processors with high computing power and the number of parallel threads of GPU can be much higher than CPU. Based on this background, this paper aims to implement a classification algorithm based on multi-threaded GPU parallel computing.Logistic regression is an efficient classification algorithm which has been widely used in statistical and biological sciences and so on. Mahout based on JAVA language achieves a variety of classic classification algorithms. Mahout is a machine learning library which is widely used. In this paper, various topics of high-performance GPU programming, such as software architecture, programming model, memory model, were analyzed in detail. Through in-depth study of mahout logistic regression algorithm, computing parts with parallel nature were identified, and moved into GPU.In this paper, the test module of Mahout classifier has been designed and implemented. Firstly, detailed analysis the key steps of train modules and test module of logistic regression classifier algorithm. Discuss the feasibility of using GPU parallel computing to accelerate the algorithm. Secondly, the focuses on the characteristics of test modules and GPU multithreaded operating characteristics, design test module algorithm based on GPU structure. The test module algorithm designed in this paper uses CPU and GPU joint work, CPU processing test algorithms complex logic, highly parallel computing by GPU processing. In order to make the algorithm suitable for GPU parallel architecture, redesigned the vector multiplication. Meanwhile, in order to achieve a more efficient way, the vector data to be calculated has been separately stored by using the global memory and zero-copy memory. Finally, we use multiple data sets with different characteristics to test the algorithm. The test results show that the algorithm improves the classification speed for most of the data sets, for some data sets the acceleration was significant.
Keywords/Search Tags:classification algorithm, logistic regression, CUDA, GPU
PDF Full Text Request
Related items