Font Size: a A A

Research On Efficient Hardware Implementation Of Lattice-based Post-quantum Cryptography

Posted on:2022-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:1480306572975699Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Lattice-based post-quantum cryptography,due to its resistance to quantum attack,easy of hardware implementation and versatility,has raised wide attention of the international academia and industry.This paper takes the efficient hardware implementation of lattice-based cryptography as the main research object,focusing on design goals of low cost,high hardware efficiency,flexibility and side-channel security.Based on optimized circuit design of the two core operators,modular arithmetic and Gaussian sampling,a generic Ring-LWE cryptographic processor architecture is proposed,and the ASIC implementation of key exchange protocol New Hope-Simple is completed.The specific research work of this paper mainly includes the following four aspects:First of all,for the three important modular arithmetics,modular addition,modular multiplication and modular polynomial multiplication,efficient hardware implementation technologies of corresponding modular arithmetic units are studied respectively.A CSA-based modular adder structure is designed,which effectively shortens the long critical path in traditional modular adder.In order to meet the requirements of low resource overhead and high performance,by using the CSA-based modular adder,a low-cost modular multiplier structure based on cyclic-shift-modular addition method and a pipelined modular multiplier structure based on fast look-up tables are designed respectively.According to the different types of modulus q in lattice-based cryptography schemes,the corresponding butterfly arithmetic unit(BAU)and“ping-pong”memory access scheme are designed,which eliminates the bottleneck of SRAM access and improves the performance of NTT operation.In addition,the calculation problem of PWM operation under special modulus q is solved,and the storage space for twiddle factors is optimized.The NTT units respectively using NWC and parallel calculation are implemented on the Xilinx Artix-7FPGA platform.Compared with the state-of-the-art related NTT design,24%of the FFs and 50%of the DSPs usage are reduced,and the area-time-product(ATP)is increased by up to 1.5 times,which can effectively accelerate the polynomial modular multiplication,a computationally expensive operation in lattice-based cryptography schemes.And then,aiming at Gaussian sampler that is the most vulnerable module to side-channel attack in lattice-based cryptosystem,detailed summary of the existing side-channel attacks and corresponding countermeasures are carried out,and the Gaussian sampling algorithms suitable for lattice-based cryptography schemes are analyzed in a comprehensive comparison.A CDT Gaussian sampler structure with timing attack countermeasure is proposed,which can save nearly 65%of storage space for pre-computed data.The designed sampling step ensures constant-time sampling,and its ability to resist timing attacks is verified on the timing attack test board built of a Xilinx Spartan-3A FPGA chip.Then,the SPA vulnerability of BS-CDT Gaussian sampler is analyzed and concluded by establishing a mathematical function model,and a chosen input SPA attack scheme is designed.Based on Hiding technology,additional circuit structures with randomized power consumption information are added to the BS-CDT Gaussian sampler,a configurable BS-CDT Gaussian sampler structure is proposed,and finally its SPA attack countermeasure is tested and verified on the self-built SPA attack test platform.With a sampling precision of 112-bit and a sampling bound in[-55,55],our design can generate a Gaussian sample in8 clock cycles consuming only 122 Slices on a Xilinx Spartan-6 FPGA,while being able to resist both timing attacks and SPA attacks.Afterwards,in order to meet the hardware implementation of different Ring-LWE lattice-based cryptography schemes in diversified application scenarios,based on the modular arithmetic units and Gaussian samplers,a configurable datapath that can support operations under any security parameters for Ring-LWE lattice-based cryptography schemes is designed.An efficient data storage scheme is proposed,which can effectively simplify the calculation of data addresses for modular polynomial operations.By analyzing the specific operation steps of Ring-LWE public-key encryption scheme,two types of micro-instructions,data configuration and data operation,are designed,and all the operations in Ring-LWE public-key encryption scheme can be described.Adopting the design idea of micro-instruction and bus multiplexing,a reconfigurable Ring-LWE cryptographic processor architecture that can flexibly configure security parameters,dynamically select operation flows,and support for a variety of Ring-LWE lattice-based cryptography schemes is proposed.When the security parameter sets are(n=256,q=7681,s=11.31),the processor can execute an encryption/decryption operation on a 256-bit message in 4.5/0.9 ms with an 80MHz clock whilst it consumes only 1307 LUTs,889 FFs,4 BRAMs on the Xilinx Spartan-6 FPGA platform.Finally,with the optimization goal of low hardware resource consumption and high computing performance,the post-quantum key exchange protocol New Hope-Simple,a specific lattice-based cryptography scheme,is implemented from FPGA prototype system to ASIC in advanced process.The algorithms and operation steps of New Hope-Simple are analyzed,and the advantages and feasibility of using Trivium stream cipher as the PRNG in cryptosystem are demonstrated.The Parse function unit and binomial sampler are designed by reusing a same Trivium x64 structure,which are used for sampling from uniform distribution and centered binomial distribution under modulus q,respectively.The architectures of the server and client are designed,and the operation and data flow are optimally arranged by the idea of parallel calculation.All the operations of key exchange process can be completed in 25148 and 23887 clock cycles,respectively.The ASIC implementation of post-quantum key exchange system composed of the server and the client is evaluated in TSMC 28nm HPC+CMOS process.The total equivalent gates are472K with occupying 0.504mm~2 area,the maximum frequency is 80MHz and the average power consumption is 86m W.
Keywords/Search Tags:Lattice-Based Post-Quantum Cryptography, Hardware Efficiency, Modular Arithmetic, Gaussian Sampling, Side-Channel Attack
PDF Full Text Request
Related items