Font Size: a A A

The Implementation Of High-Speed Elliptic Curve Cryptography On Nvidia GT200Graphic Processing Unit

Posted on:2015-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:S J CuiFull Text:PDF
GTID:2268330431953859Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Elliptic Curve Cryptography (ECC) is a widely-used form of public-key cryptography that allows for a high level of security with relatively short keys. Driven by the ever-increasing demand for performance, the efficient implementation of ECC has become an active area of research in both academic and industry. Graphics Processing Units (GPUs) are nowadays commonly used in general-purpose computing due to their powerful parallel processing capabilities. Consequently, it becomes an excellent accelerator for cryptography computations.This paper describes a high-speed software implementation of Elliptic Curve Cryptography (ECC) for GeForce GTX graphics cards equipped with an NVIDIA GT200Graphics Processing Unit (GPU). Our library is able to perform scalar multiplication on twisted Edwards curves and Montgomery curve defined over Optimal Prime Fields (OPFs).In order to maximize throughput, our ECC software allocates just a single thread per scalar multiplication and aims to launch as many threads in parallel as possible, namely, SIMD. We adopt elliptic curves in Montgomery as well as twisted Edwards form, both defined over a special family of finite fields known as Optimal Prime Fields (OPFs). For random point case, the implementation is performed on Montgomery curve with Montgomery ladder algorithm. For fixed base point case, we employ comb method with256pre-computed points. We also implemented a special variant that does not execute any conditional statements, which helps to reduce side-channel leakage and avoid thread divergence. The points on the twisted Edwards curve are represented in extended projective coordinates so that a point addition can be performed with only seven multiplications in the underlying field.Our ECC library supports OPFs defined by primes of the formp=u·2k+1, whereby u has a length of16bits. We implemented the arithmetic operations in OPFs using a radix-2A24representation of the field elements (i.e. we store24bits in each word), which allows us to take advantage of the native (24x24)-bit integer multiply instruction of the GT200. Furthermore, the24-bit representation facilitates the idea of incomplete modular reduction, i.e. we do not need to always fully reduce the result of a field addition or multiplication.After a detailed evaluation of numerous implementation options and configurations, we managed to launch2880threads on the30multiprocessors of the GT200when the elliptic curve has Montgomery form and is defined over a224-bit OPF. The resulting throughput is115200scalar multiplications per second (for arbitrary base points) and we achieved a minimum latency of19.2ms. In a fixed-base setting with256pre-computed points, the throughput increases to some345417scalar multiplications and the latency drops to4.52ms. The latter result outperforms the fastest implementation reported in the literature by a factor of2.45.
Keywords/Search Tags:ECC, GPU, Montgomery, High-speed, parallelization
PDF Full Text Request
Related items