Font Size: a A A

Design And Implementation On The GPU Of Bayesian Text Classifying Algorithm

Posted on:2015-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:C P YangFull Text:PDF
GTID:2298330467463540Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Mobile Internet and enterprise informatization, there has been more information from the Internet stored in the form of text. It is being a challenge and also an urgent problem about how to obtain valuable information from themassive data. This situation gave birth to the Text mining research field.Text mining can be used to retrieve or filter useful messages from unsupervised massive documents. The efficiency of text mining algorithms is closely related to data dimensions and size of the data set. If the data is too large, the performance of the algorithm will encounter bottlenecks. Data mining algorithms running on a single CPU have been unable to meet user demand.This paper mainly designs a parallel naive Bayesianclassifying system that can implement classifying works parallelly based on the principle of naive Bayesianclassifying algorithm, architecture of GPU and the programming model of CUDA (Computer Unified Device Architecture). This system can increase improve the efficiency of text data mining by fully using the compute power of GPU. This paper mainly completes the following works:First, this paper investigates the principle of naive Bayesian algorithm, architecture of GPU and the programming model of CUDA, summarizes and divides naive Bayesian algorithm into several steps and finds out the steps that can be implemented parallelly, then design a parallel naive Bayesianclassifying system that can be implemented parallelly. The system contains five modules like preparation module, text training module, text classifying module, classifying result evaluatation module and classifying result feedback module. This paper mainly does modifying works on the text training module and text classifying module. At the end, this paper does some efficiency improving works based on architecture of GPU.After testing the implementation of the4different data sets on the architecture of GPU combined with CPU, the test results show that parallel text classifying system implemented in this article achieves quite good acceleration effect.
Keywords/Search Tags:document classifying, naive bayesian, CUDA, parallel computing
PDF Full Text Request
Related items