Font Size: a A A

Hardware Study On Heterogeneous Multi-core Security Network Processor

Posted on:2012-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B XieFull Text:PDF
GTID:1228330395457210Subject:Integrated circuit system design
Abstract/Summary:PDF Full Text Request
High-speed security network processor plays a more and more important role innetwork development. Supported by the project of network processor, this dissertationfocuces on the research of hardware, including ALU, security cryptography circuit andso on, in security network processor and presents five main contributions as follows:1. XDNP heterogeneous multi-core security network processor was implemented,among which all IP cores were independently developed. XDNP consists of oneXD-MP Core, six Packet Engines (PEs), one security cryptoprocessor, one SRAMcontroller, one SDRAM controller and Media and Switch Fabric Interface(MSF).There are two kinds of buses on chip. One is control plane bus; the other is dataplane bus. A new chip bus architecture based on split transaction was proposed. Byadopting the new architecture, data plane bus is divided into two parts. One part is acommand bus shared by all cores, the other part are several data buses correspondingto each core. This architecture allows different data bus having data transferred at thesame time, which brings with high throughput and low bus latency.2. The logic expressions were optimized for block generate and block propagatesignals in fast adder which can be implemented using differential cascode voltageswitch with pass-gate (DCVSPG) logic. This method solves the problem of logicconflict in static Manchester carry bypass circuit, and eliminates the cost of delay andpower in charge stage of dynamic Manchester carry bypass circuit. It has a higherspeed and lower power than CMOS stander cell carry generate circuit. The problemthat the size of every NMOS transistor in DCVSPG logic would affect the performanceof the circuit is discussed. Then a simple delay model of DCVSPG logic was built toevaluate the delay of the circuit. The delay model of DCVSPG logic can be used tooptimize the size of NMOS transistors in adder circuit implemented by DCVSPG logic.A32-bit adder was implemented by DCVSPG logic, of which performance is higherthan that of adder implemented by CMOS stander cell.3. Modular multiplication and exponentiation severely restrict the RSAperformance. The thesis presents a modified Montgomery modular multiplicationalgorithm based on the two-level carry-save addition (CSA) tree. By inserting registers,the algorithm shortens the critical path and guarantees operands to arrive at the CSAinput ports simultaneously, which significantly improves the speed of modularmultiplication. Modular-multiplication sequence was adjusted in modular exponentiation, which avoids most format conversion and reduces the conversion time.The proposed modular exponentiation circuit has a higher performance improvementcompared with most representative design.4. Elliptic Cure Cryptography contains a large number of modular multiplicationand squaring operations over prime and binary finite fields. For prime finite field, thetraditional Montgomery algorithm was modified and2-bit prefix Montgomery modularmultiplication circuit was designed. Then partial-products were reconstructed based oninherent characteristic of square arithmetic, which reduces the number ofpartial-products by half. Half Number Partial-Products modular squaring algorithmwas proposed and modular squaring circuit was designed based on the new algorithm.Modular squaring operation time is only half of modular multiplication. For binaryfinite field, word-serial modular multiplication algorithm was proposed. Multiplier canbe implemented by pipeline technique due to multiplicand shifted left by two words inthe new algorithm. Also, the new algorithm simplifies the calculation of some keyvariables, thus the circuit path delay of multiplier is reduced. The result of modularmultiplication can be fast calculated by word-serial modular multiplier.Two-words-serial modular squaring algorithm was proposed. The algorithm adoptsMontgomery method to do modular reduction on the squaring result which is directlyobtained according to the characteristic of binary finite fields square arithmetic. Thealgorithm can handle two words at one time, thus the calculation time of modularsquaring is half of that of modular multiplication. Then this thesis presents a highlyefficient ECC dual-field processor consisted of these finite field arithmetic units.5. The switchable TAM architecture was presented that some IP cores attached tomultiple TAMs by switching circuit. So these IP cores can be tested by several TAMs,which will reduce idle time and test time effectively. By0-1programming, which wasrestricted in some given conditions, each IP core was allocated to a TAM, and thenheuristic search arithmetic was used to pick out some appropriate IP cores to be testedby multiple TAMs. Experimental results on ITC2002benchmark circuits show that ourapproach is better than some other approaches.
Keywords/Search Tags:Security network processor, Adder, RSA cryptography, Elliptic curvecryptography, Test schedule
PDF Full Text Request
Related items