A fast and scalable hardware architecture for K-means clustering for big data analysis

Posted on:2017-01-05

Degree:M.S.E.E

Type:Thesis

University:University of Colorado at Colorado Springs

Candidate:Raghavan, Ramprasad

Full Text:PDF

GTID:2458390008979782

Subject:Engineering

Abstract/Summary:

The exponential growth of complex, heterogeneous, dynamic, and unbounded data, generated by a variety of fields including health, genomics, physics, and climatology pose significant challenges in data processing and desired speed-performance. Existing processor-based (software-only) algorithms are incapable of analyzing and processing this enormous amount of data efficiently and effectively. Consequently, some kind of hardware support is desirable to overcome the challenges in analyzing big data. Our objective is to provide hardware support for big data analysis to satisfy the associated constraints and requirements..;Big data analytics involves many important data mining tasks including clustering, which categorizes data into meaningful groups based on the similarity or dissimilarity among objects. In this research work, we investigate and propose customized hardware architecture for K-means clustering, one of the most popular clustering algorithms. Our hardware design can execute multiple computations in parallel to significantly enhance the speed-performance of the algorithm, by exploiting the inherent parallelism and pipelining nature of the operations.;We design and develop our hardware architecture on a Field Programmable Gate Array (FPGA)--based development platform. Experiments are performed to evaluate the proposed hardware design with its software counterpart running on an embedded processor on the same development platform. Different hardware configurations (consisting of varying number of parallel processing elements) are processed on varying data sizes. Our hardware configuration consisting of 32 parallel processing elements (PEs) is executed up to 150 times faster than the software-only solution that is executed by the processor. It is observed that the speed-performance further increases with the number of parallel PEs as well as with the size of the data.;These investigations demonstrate that hardware support for clustering algorithms is not only feasible but also crucial to meet the requirements and constraints associated with analyzing and processing big data. Our proposed hardware architecture is generic and parameterized. It is scalable to support larger and varying datasets as well as a varying number of clusters.

Keywords/Search Tags:

Data, Hardware, Clustering, Support, Varying

Related items

1	Research On Varying-Density Spatial Clustering In High-Dimensional Data
2	Software and Hardware Support for Data-Race Exceptions
3	Decision Support System In Telecom Ip Clustering Algorithms, Applications And Research
4	An Improved Algorithm For Support Vector Clustering
5	Hardware support for quality-of-service guarantees in packet switched networks
6	Research On Machine Learning Based Hardware Trojan Detection Method
7	Research On Projected Clustering Algorithm And Its Applications
8	A Study On Algorithm For Classification Based On Support Vector Data Description
9	Varying-scale Support Vector Regression Method
10	Using K-Mean And SVM To Build Hybrid Methodology To Classify Diseases