Font Size: a A A

Research On Accelerating Of K-means Clustering Algorithm Using FPGA Based On MapReduce

Posted on:2017-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:M L YangFull Text:PDF
GTID:2348330509459729Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of big data, the Internet produces large amounts of data everyday, and we can make use of data mining and machine learning algorithm to analyze the valuable data. K-means algorithm is most widely used in clustering analysis method, which is the way of dividing dataset by specifying the cluster number K value, and then generates clustering. Kmeans algorithm is very simple, and it has rapid convergence speed. When faced with large data sets, K-means encounters the bottleneck of large computation. What's more serial program based on a single core and single machine has been unable to meet the requirements. In order to solve these problems, we present an acceleration system which uses MapReduce to accelerate the computation based on CPU and FPGA cluster architecture. Thus we can use parallel processing technology to improve K-means clustering algorithm processing speed under the large amount of data.At first, we analyze K-means clustering algorithm to find the most time-consuming calculations, and then take advantage of MapReduce parallel programming model to design K-means clustering algorithm. According to the characteristics of K-means, after ensuring the task of map and reduce process, we set up a combine process to reduce the intermediate results after processing of the map. Through PCI-express bus, we plug several FPGA accelerating card in a single computing nodes, and then transfer the most time-consuming computing task to FPGA using driver. By streamlining the function modules and paralleling the internal of function module in FPGA chip, the computing will be much faster.Map accelerator's processing logic includes interface section and computation section. What's more, interface section consists of PCIe interface module, hardware platform interface module, data sending module and data receiving module; computation section consists of Map packets parsing module, multi-map calculation module, scheduling module and combine calculation module. Reduce accelerator is the same as Map accelerator except computation section. Reduce accelerator's computation section is composed of Reduce packets parsing module, documents' number accumulated module and document vector accumulated module. In addition, we use Verilog hardware description language to implement all processing logic.Finally, all of the relevant functional module is simulated. After the overall cosimulation, we download the entire processing logic into FPGA to do test. The experimental results show that the processing logic are correct. Through comparing the implemention of K-means on traditional Hadoop2.0 platform with on the new architecture, it can verify the feasibility of the new architecture and performance advantages.
Keywords/Search Tags:Large amount of data, MapReduce, FPGA, K-means, PCI-express bus
PDF Full Text Request
Related items