Research On Accelerating Of K-means Clustering Algorithm Using FPGA Based On MapReduce

Posted on:2017-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:M L Yang

Full Text:PDF

GTID:2348330509459729

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the era of big data, the Internet produces large amounts of data everyday, and we can make use of data mining and machine learning algorithm to analyze the valuable data. K-means algorithm is most widely used in clustering analysis method, which is the way of dividing dataset by specifying the cluster number K value, and then generates clustering. Kmeans algorithm is very simple, and it has rapid convergence speed. When faced with large data sets, K-means encounters the bottleneck of large computation. What's more serial program based on a single core and single machine has been unable to meet the requirements. In order to solve these problems, we present an acceleration system which uses MapReduce to accelerate the computation based on CPU and FPGA cluster architecture. Thus we can use parallel processing technology to improve K-means clustering algorithm processing speed under the large amount of data.At first, we analyze K-means clustering algorithm to find the most time-consuming calculations, and then take advantage of MapReduce parallel programming model to design K-means clustering algorithm. According to the characteristics of K-means, after ensuring the task of map and reduce process, we set up a combine process to reduce the intermediate results after processing of the map. Through PCI-express bus, we plug several FPGA accelerating card in a single computing nodes, and then transfer the most time-consuming computing task to FPGA using driver. By streamlining the function modules and paralleling the internal of function module in FPGA chip, the computing will be much faster.Map accelerator's processing logic includes interface section and computation section. What's more, interface section consists of PCIe interface module, hardware platform interface module, data sending module and data receiving module; computation section consists of Map packets parsing module, multi-map calculation module, scheduling module and combine calculation module. Reduce accelerator is the same as Map accelerator except computation section. Reduce accelerator's computation section is composed of Reduce packets parsing module, documents' number accumulated module and document vector accumulated module. In addition, we use Verilog hardware description language to implement all processing logic.Finally, all of the relevant functional module is simulated. After the overall cosimulation, we download the entire processing logic into FPGA to do test. The experimental results show that the processing logic are correct. Through comparing the implemention of K-means on traditional Hadoop2.0 platform with on the new architecture, it can verify the feasibility of the new architecture and performance advantages.

Keywords/Search Tags:

Large amount of data, MapReduce, FPGA, K-means, PCI-express bus

PDF Full Text Request

Related items

1	Research On Mapreduce Based Big Data K-means Clustering Algorithm
2	Study And Implementation On Full-text Search Engine Based On LUCENE Under The Large Amount Of Data
3	No Default Categories For Large Amount Of Data Clustering Algorithm Research
4	Research On Optimization Of Express Transportation Service Network Considering Minimum Tail Quantity
5	Research On K-Means Algorithm Based On MapReduce
6	Technology And Research Of High-Speed Data Transfer Base On PCI Express
7	Research On Parallel Clustering Algorithm For Large - Scale Data Set
8	Research On High-Speed Data Exchange Technology With PCI Express Based On FPGA
9	Design And Implementation Of High Speed Data Acquisition And Transmission System Based On PCI-Express 3.0
10	Design And Implemen Tation Of High-Speed Data Transfer System Base On PCI Express And DDR2 SDRAM