Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Networ

Posted on:2018-08-02

Degree:M.S

Type:Thesis

University:University of Maryland, Baltimore County

Candidate:Kulkarni, Adwaya Amey

Full Text:PDF

GTID:2448390005451575

Subject:Computer Engineering

Abstract/Summary:

Lightweight Machine Learning (ML) and Convolution Neural Network (CNN) can offer solutions for wearable cognitive devices and the resource-constrained Internet of Things (IoT) platforms. However, the implementation of ML and CNN kernels are computationally intensive and faces memory storage issues on tiny embedded platforms. In recent years, heterogeneous hardware and acceleration, where compute intensive tasks are performed on kernel specific cores, have gained attention with growing interest in the industry to develop tiny lightweight manycore accelerators that address these issues. In this thesis, we propose two extended versions of an existing manycore architecture ``PENC: Power Efficient Nano Cluster" which can efficiently implement common ML and CNN kernels with much-reduced computation and memory complexity. First, we propose ``PACENet: Programmable many-core ACcElerator'', which has CNN specific instructions for frequently used kernels such as convolution, activation functions such as ReLU (RELU) and Max-pool (MP), and machine learning specific instructions for Manhattan distance calculation (MNT). Secondly, we propose ``BiNMAC: Binarized Neural network Manycore ACcelerator'' that implements the binary neural network. Reducing weights to binary format will not only reduce memory access bottleneck but also reduce computations since most arithmetic operations are replaced with bit-wise operations. To add binarized CNN capability, we implemented instructions such as Batch XOR, and XNOR, PCNT (population count), PCH (patch selection) and BCAST (a communication-based instruction) in the existing instruction set hardware. Both PACENet and BiNMAC cores were fully synthesized and placed and routed using TSMC 65~nm CMOS technology. Each single processing core of PACENet occupies 98.7um2 area and consumes 32.2mW power operating at 1GHz frequency and 1V, while BiNMAC single core occupies 97.9um2 area and consumes 31.1mW power. Compared to existing PENC manycore architecture, PACENet achieves 13.3% area reduction and 14.1% power reduction at 1GHz frequency. Compared to the original PENC architecture, BiNMAC achieves 17.1% area reduction and 13.2% power reduction at 1GHz frequency. To conclude this work, we also evaluated the performance of PACENet and BiNMAC accelerators with respect to personalized biomedical applications namely stress detection and seizure detection, and computer vision namely object detection application. The stress detection and seizure detection application are evaluated on ARL dataset and Boston hospital dataset for K-nearest neighbor algorithm. The proposed PACENet shows 59% increase in throughput, and 43.7% reduction in energy consumption for stress detection application, whereas, for seizure detection application, PACENet improves 60% throughput and 43.6% reduction in energy consumption in comparison to the PENC manycore. For computer vision application, we evaluated ResNet-20 network trained using CIFAR-10 dataset for both PACENet and BiNMAC accelerators. PACENet achieves 2.3x higher throughput per watt performance and requires 57.3% less reduction in energy consumption compared to the PENC manycore. For SensorNet implementation, the proposed BiNMAC achieves 1.8x higher throughput and consumes 13x less energy as compared to PENC manycore, while the ResNet-20 network implementation takes 36x higher throughput consuming 195x less energy.

Keywords/Search Tags:

Network, Manycore, Machine learning, CNN, Convolution, Higher throughput, Energy, Binary

Related items

1	Machine Learning-inspired High-performance and Energy-efficient Heterogeneous Manycore Chip Desig
2	Research On Recognizing Functions In Binary Code Of ARM Platform Based On Machine Learning
3	Resource Optimization For LTE-U Networks Based On Machine Learning
4	Research And Application Of Circular Convolution Parallel Extreme Learning Machine
5	Research And Application Of License Plate Recognition Based On Machine Learning And Convolution Neural Network
6	Software Assists to On-chip Memory Hierarchy of Manycore Embedded System
7	Throughput Characterization In Wireless Communication Networks With RF Energy Harvesting
8	Design And Implimentation Of Energy-Efficient Binary Neural Network Accelerator
9	Data Optimization Of Go Strategy Network Based On Machine Learning
10	Towards Energy Efficient and Reliable 3D Manycore Chip Enabled by Machine Learnin