Font Size: a A A

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Networ

Posted on:2018-08-02Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Kulkarni, Adwaya AmeyFull Text:PDF
GTID:2448390005451575Subject:Computer Engineering
Abstract/Summary:
Lightweight Machine Learning (ML) and Convolution Neural Network (CNN) can offer solutions for wearable cognitive devices and the resource-constrained Internet of Things (IoT) platforms. However, the implementation of ML and CNN kernels are computationally intensive and faces memory storage issues on tiny embedded platforms. In recent years, heterogeneous hardware and acceleration, where compute intensive tasks are performed on kernel specific cores, have gained attention with growing interest in the industry to develop tiny lightweight manycore accelerators that address these issues. In this thesis, we propose two extended versions of an existing manycore architecture ``PENC: Power Efficient Nano Cluster" which can efficiently implement common ML and CNN kernels with much-reduced computation and memory complexity. First, we propose ``PACENet: Programmable many-core ACcElerator'', which has CNN specific instructions for frequently used kernels such as convolution, activation functions such as ReLU (RELU) and Max-pool (MP), and machine learning specific instructions for Manhattan distance calculation (MNT). Secondly, we propose ``BiNMAC: Binarized Neural network Manycore ACcelerator'' that implements the binary neural network. Reducing weights to binary format will not only reduce memory access bottleneck but also reduce computations since most arithmetic operations are replaced with bit-wise operations. To add binarized CNN capability, we implemented instructions such as Batch XOR, and XNOR, PCNT (population count), PCH (patch selection) and BCAST (a communication-based instruction) in the existing instruction set hardware. Both PACENet and BiNMAC cores were fully synthesized and placed and routed using TSMC 65~nm CMOS technology. Each single processing core of PACENet occupies 98.7um2 area and consumes 32.2mW power operating at 1GHz frequency and 1V, while BiNMAC single core occupies 97.9um2 area and consumes 31.1mW power. Compared to existing PENC manycore architecture, PACENet achieves 13.3% area reduction and 14.1% power reduction at 1GHz frequency. Compared to the original PENC architecture, BiNMAC achieves 17.1% area reduction and 13.2% power reduction at 1GHz frequency. To conclude this work, we also evaluated the performance of PACENet and BiNMAC accelerators with respect to personalized biomedical applications namely stress detection and seizure detection, and computer vision namely object detection application. The stress detection and seizure detection application are evaluated on ARL dataset and Boston hospital dataset for K-nearest neighbor algorithm. The proposed PACENet shows 59% increase in throughput, and 43.7% reduction in energy consumption for stress detection application, whereas, for seizure detection application, PACENet improves 60% throughput and 43.6% reduction in energy consumption in comparison to the PENC manycore. For computer vision application, we evaluated ResNet-20 network trained using CIFAR-10 dataset for both PACENet and BiNMAC accelerators. PACENet achieves 2.3x higher throughput per watt performance and requires 57.3% less reduction in energy consumption compared to the PENC manycore. For SensorNet implementation, the proposed BiNMAC achieves 1.8x higher throughput and consumes 13x less energy as compared to PENC manycore, while the ResNet-20 network implementation takes 36x higher throughput consuming 195x less energy.
Keywords/Search Tags:Network, Manycore, Machine learning, CNN, Convolution, Higher throughput, Energy, Binary
Related items