Font Size: a A A

A fast and scalable hardware architecture for K-means clustering for big data analysis

Posted on:2017-01-05Degree:M.S.E.EType:Thesis
University:University of Colorado at Colorado SpringsCandidate:Raghavan, RamprasadFull Text:PDF
GTID:2458390008979782Subject:Engineering
Abstract/Summary:
The exponential growth of complex, heterogeneous, dynamic, and unbounded data, generated by a variety of fields including health, genomics, physics, and climatology pose significant challenges in data processing and desired speed-performance. Existing processor-based (software-only) algorithms are incapable of analyzing and processing this enormous amount of data efficiently and effectively. Consequently, some kind of hardware support is desirable to overcome the challenges in analyzing big data. Our objective is to provide hardware support for big data analysis to satisfy the associated constraints and requirements..;Big data analytics involves many important data mining tasks including clustering, which categorizes data into meaningful groups based on the similarity or dissimilarity among objects. In this research work, we investigate and propose customized hardware architecture for K-means clustering, one of the most popular clustering algorithms. Our hardware design can execute multiple computations in parallel to significantly enhance the speed-performance of the algorithm, by exploiting the inherent parallelism and pipelining nature of the operations.;We design and develop our hardware architecture on a Field Programmable Gate Array (FPGA)--based development platform. Experiments are performed to evaluate the proposed hardware design with its software counterpart running on an embedded processor on the same development platform. Different hardware configurations (consisting of varying number of parallel processing elements) are processed on varying data sizes. Our hardware configuration consisting of 32 parallel processing elements (PEs) is executed up to 150 times faster than the software-only solution that is executed by the processor. It is observed that the speed-performance further increases with the number of parallel PEs as well as with the size of the data.;These investigations demonstrate that hardware support for clustering algorithms is not only feasible but also crucial to meet the requirements and constraints associated with analyzing and processing big data. Our proposed hardware architecture is generic and parameterized. It is scalable to support larger and varying datasets as well as a varying number of clusters.
Keywords/Search Tags:Data, Hardware, Clustering, Support, Varying
Related items