Font Size: a A A

Research And Implementation On Key Technologies Of Programmable Cryptographic Processors

Posted on:2007-11-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M ZhaoFull Text:PDF
GTID:1118360215970561Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cryptographic algorithms (CAs) are widely used to ensure security requirements such as confidentiality, integrity and usability. For performance as well as for implementation security reasons it is often required to realize CAs in hardware. Application specific integrated circuits (ASIC) and fine-grain reconfigurable structures (FRS) are two traditional approaches. A well-known drawback of ASIC solution is low flexibility. FRSs have sufficient flexibility, but suffer from significant overhead due to their generic nature.CAs have relatively fixed granularity and similar processing mode. Researchers have proposed several cryptography-specified reconfigurable structures by spatial programmability and several cryptographic processors by temporal programmability, these works achieved good tradeoffs between performance and flexibility. However, current reconfigurable structures are limited from practical applications because of difficulties in mapping CAs to them. For cryptographic processors, although it is convenient to develop algorithms by using compiler, their data-paths are constrained by the traditional architectures and can't accelerate CAs efficiently.Starting from temporal programmability, this paper shift the hardware/software interface downwards, and let the software specify data transports and every transport's routing path. This addresses the problems in designing complex but efficient data paths for traditional architectures. According to different class of cryptographic algorithms and the application environments, several practical programmable cryptographic processors are proposed and implemented. The main work and results are:1. We propose an automatic generation method for application specific instruction-set processor (ASIP) directed by transport triggered architecture (TTA). In TTA, software specifies data transports among function units (FUs), so application specific hardware can support more sophisticated FUs, and the problems about instruction generation and retargetable compiling can be solved at the same time. Configuration stream driven computing architecture (CSDCA) is proposed, where routing is performed by the compiler to support efficient but complex interconnections. Combined with segmented buses, we solve the problem that with the increase of FU number, the interconnection network of TTA becomes a bottleneck for frequency and consumes much extra power for specific data transport. RISC|CSDCA dual mode computing is proposed to enhance code density. Computation-intensive loops, which occupy most of the computing time, are performed in CSDCA mode to get higher performance, and the others are processed in RISC mode to reduce code redundancy. The above works build an ASIP design flow supporting efficient but complex data path.2. We propose and implement a high-performance modular exponentiation (ME) processor. A radix-length based high radix Montgomery modular multiplication algorithm is proposed, with this algorithm a ME can be decomposed into a series of primitive operation (PO) matrixes. A column sharing super-pipelining array (CSSA) is designed to perform these PO matrixes. Combined with the above ASIP design flow, a complete ME processor SEA-II is implemented. A decryption rate of 6.35 Mbps can be achieved for 1024-bit RSA with SEA-II.3. We propose a dual-field scalable processor implementing whole public key cryptosystems. A dual-field unified RBHRMMM algorithm is proposed, based on this algorithm, a row sharing super-pipelining array (RSSA) is designed. By embedding RSSA to the above ASIP design flow, a scalable public key processor SPKP is implemented. SPKP has such characters: (I) ECC whole algorithms can be developed conveniently through the TTA tool chain; (II) RSSA is scalable; (III) pipeline elements perform vector production and support Galois field GF(p) and GF(2n); (IV) different performance/area constraint can be achieved by adjusting the bus width and the number of RSSA's pipeline elements.4. We propose a high-performance cryptographic hash processor. We propose a novel method to split hash algorithms, i.e. the kernel of a hash algorithm can be splitting into compress modules and an expansion module, and every module has the same structure and includes a query, a fusion sub-module and an accumulator. Custom reconfigurable FUs are designed base on this method, and by integrating them into the ASIP design flow, a cryptographic hash processor PSHP is implemented. Compared to fine-grain reconfigurable architecture, PSHP is faster and more area-efficient; compared to ASIC, it can support widely-used hash algorithms with a little overheads.5. We propose a high-performance block cipher processor PSCP. We propose two optimization principles: (I) the number of memory access in kernels can be decreased to zero by coupling a substantial unit and a sub-key storage unit; (II) reorganizing the basic operations to balance delay distribution. Compared with ASIC solutions, PSCP can achieve similar performance in CBC, CFB, or OFB mode, and PSCP has more flexibility. Compared to custom reconfigurable structures, PSCP has a more convenient developing method, and support the complete algorithm including key expansion, so PSCP is much safer and more usable.These processors all use 0.18μm 1P6M CMOS technology, and the ME processor has been sold in the market.
Keywords/Search Tags:cipher processing, public key cryptosystem, block cipher, security hash, code compression, network-on-chip, configuration stream driven, application specific instruction-set processor
PDF Full Text Request
Related items