Font Size: a A A

Research On Classfication Algorithms Based On Multicore Computing

Posted on:2013-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2248330362470866Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowdays the technology of computer manufacturer is developing rapidly, at the same times, thestudy of Multicore CPU chip has reached a significant conclusion. It leads us into the MulticoreComputing age. As result of that, more and more researchers pay attention to Multicore parallel, whilethe traditional one is Multicomputer parallel. Using Multicore computing is a effective method, and itcan cut down hardware expenditure. Compare with Multicomputer, Multicore can save energy, itmeets our times theme’s requirement of both energy saving and environment protecting. This projectcombines Multicore Computing techonogy with Data Mining techonogy to realize an effectivemethod of data processing.This article focuses on the research of Classfication which is a branch of Data Mining. Duringthe research of Multicore Classfication, it not only does some research on how to make a traditionaalgoritm into a one can run on Multicore system, but do study on how to resolve the problem causedby using Multicore Computing techonogy.The main work of this paper is summarized as follows:(1)Though it has many advantages, the fatal problem of KNN method is that the classificationprocess is very inefficient. To solve this problem, this paper, uses Multicore Computing techonogy toimprove this algorithm. This paper interprets this algorithm in two angles: the Data Partition Modeand the Task Partition Mode. And it proposes two kinds of KNN algorithm based on MulticoreComputing techonogy: MDKNN and MTKNN. The designing idea of MDKNN is to improve theefficiency of the execution by dividing the data set into many subsets and executing on different CPUcores. MTKNN treats the classification process as a whole task and the prediction for single testingtuple as sub-task. The idea of MTKNN is to make each sub-task can execute in parallel on multicoreplatforms. The experimental results demonstrate that the two algorithms can maintain the originalclassification accuracy while improving the efficiency greatly.(2)The most time consuming part of decision tree method is the process of creating the decisiontree structure. This article does some research on improving the procedure of creating the structureusing multicore computing technology, thus, it proposes a classification algorithm MPID3based onthe classic algorithm ID3. Since the traditional algorithm uses recursive method to create the treestructure, so to make the new algorithm run in multicore platforms smoothly, this paper design a taskqueue for each CPU core’s getting and setting tasks. The experimental results demonstrate that thenew algorithm can maintain the original classification accuracy while improving the efficiency greatly.(3)The most time consuming part of Bayesian network method is the process of Bayesianlearning. This article does some study on improving the procedure of Bayesian learning usingmulticore computing technology, thus, it proposes a classification algorithm MPBN. During theprocedure of Bayesian learning, it divides the parameter learning process into many parts and assignsthem to different CPU cores to execute. After any part is finished, it will update the network structureaccordingly. Considering that it is very difficult to create the model of Bayesian network structure,this paper uses PNL library which is raised and released by Intel PNL library to create the model. Theexperimental results demonstrate that the algorithm can maintain the original classification accuracywhile improving the efficiency greatly.
Keywords/Search Tags:Data Mining, Classification Algorithm, MultiCore Computing, Task Scheduling
PDF Full Text Request
Related items