Font Size: a A A

Research On Hardware Acceleration Based On FPGA Of Convolutional Neural Network And Elliptic Curve Algorithm

Posted on:2021-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H HuFull Text:PDF
GTID:1368330602493449Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the gradual invalidation of Moore's law,the performance improvement of software acceleration method has encountered a bottleneck.Especially for the new applications that are computationally intensive and data-intensive,the software solutions implemented by the central processing unit(CPU)can no longer meet the needs of the new applications.Hardware acceleration technology can address the needs of emerging applications because hardware acceleration provides sufficient computing resources and less support for control flow.The dissertation focuses on hardware acceleration,and both elliptic curve cryptography(ECC)algorithm and convolutional neural network(CNN)have the characteristics of large computational load and high complexity,so they are especially suitable for the research object of hardware acceleration.In addition,the operand data length of the modular operations and point operations of the ECC is large,usually 256-bit,which is a computing tasks with high data length and is computation-intensive.The convolution calculation of CNN is composed of a large number of repeated multiplication and addition,which is a computing task with a large amount of repetitive computation and has the characteristics of data-intensive.As two kinds of computing tasks,ECC and CNN are the best choice for hardware acceleration.Among the two emerging applications of information security and deep learning,ECC and CNN are one of the most popular cryptographic algorithms and network types,so they are of great research significance and application value to study their hardware acceleration technology.In this dissertation,several key problems about hardware acceleration scheme of ECC and CNN are studied.The main research work and innovation points of this dissertation are as follows.(1)With the aim at low power consumption,this dissertation summarizes researches on ECC hardware architectures about their advantages and disadvantages,and proposes a low power adder-based architecture.Decreasing the consumption of hardware resources can reduce the power consumption.For low power architecture,cutting down the use of adders can reduce the power consumption.Firstly,on the premise of keeping the equal performance,the Interleaved Modular Multiplication algorithm is optimized to reduce the adders from three to two,and the Binary Modular Inversion algorithm is improved to reduce the adder from four to two.Secondly,with the hardware reuse technology,all modular operations,including modular addition,modular subtraction,modular multiplication and modular inversion,use only two adders.Finally,in order to make full use of the adders,the pipeline technology is used to optimize the point adder and point doubling algorithms,and the scheduling order of modular operations is optimized to improve the efficiency of point multiplication operation.In order to improve the security of the low power architecture proposed here,the point multiplication algorithm against simple power analysis attack(SPA)is used.The architecture is implemented on Xilinx Virtex-4.Compared with other architectures,the proposed low power architecture saves 17.58%-74.80%Slices resources.(2)With the aim at high performance,this dissertation analyses the advantages and disadvantages of the existing hardware architectures of ECC,and presents a high performance architecture that based on a half-word multiplier.Firstly,since the modular inversion is a time-consuming operation,the point addition and point doubling operations are implemented in affine-Jacobian coordinates to avoid modular inversion.In order to implement modular multiplication efficiently over a specific prime field,the method of combining multiplication and fast modular reduction is adopted.The multiplication is realized by Karatsuba-Ofman algorithm,and a multiplication structure based on a half-word multiplier is proposed.The multiplication structure consumes only three clock cycles to implement full-word multiplication,compared with six clock cycles for traditional multiplication structure.In the state secret algorithm SM2,a two-stage fast modular reduction algorithm is proposed for a specific prime SCA-256.It makes the intermediate result after reduction to be 0?Z<2p,instead of 0?Z<14p in the traditional algorithm,avoiding iterative subtraction operations to get the finial result(0?Z<p).The pipeline technology is adopted to optimize the operational schedules of point addition and point doubling operations,and make full use of the multiplier to improve the efficiency of point multiplication.For performance verification and comparison,the architecture is implemented on Xilinx Virtex-6,Virtex-5,and Virtex-4.Experiments show that the performance of the high performance architecture in this thesis is 3.18-7.58 times faster than that of other architectures.(3)After analyzing and summarizing the advantages and disadvantages of existing research about hardware architectures of CNN,this dissertation proposes a reconfigurable hardware architecture for CNN,and puts forward a design space exploration method based on roofline model to give full play to the performance of the architecture.There are many shortcomings in the existing research on hardware acceleration of CNN based on FPGA,such as the weak reconfiguration/configurability of their architectures and the lack of effective methods to give full play to the performance of accelerators.Firstly,a reconfigurable four-layer convolution acceleration engine based on processing element(PE)array is designed.It makes full use of DSP computing resources provided by FPGA.Then,after studying the computation partitioning and loop unrolling of CNN,and researching the data storage pattern and data transmission mode under different loop unrolling,a hybrid stationary data storage pattern is proposed.Finally,the novel roofline model for this architecture is presented,and a two-step method of design space exploration is proposed to obtain a better convolution performance and a lower data transmission power.The accelerator is implemented on the Xilinx zynq-7000 SoC ZC706 evaluation board.The experimental results show that the hardware acceleration architecture of CNN in this dissertation is much better than other accelerators in terms of reconfiguration,performance,power consumption and DSP utilization.
Keywords/Search Tags:hardware acceleration, elliptic curve cryptography, point multiplication, convolutional neural network, reconfigurable
PDF Full Text Request
Related items