Font Size: a A A

High Energy-efficient Key Techniques In Configurations For Coarse-grained Dynamically Reconfigurable Processor

Posted on:2015-10-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:1108330503954627Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Compared with the fine-grained FPGA, CGRA(Coarse-Grained Reconfigurable Architecture) has explored another type of RCP(Reconfigurable Computing Processor). The configuration system is assigned to configure the key module in the RCP, i.e. RPU(the reconfigurable processing unit). Thus the configuration system becomes crucial in the RCP design. Reducing the overhead of three design parameters, i.e. configuration context size, reconfiguration delay and reconfiguration power, is always a permanent target in the configuration system. Although research has been made to find solutions for the three points mentioned above, there are no systematic results can be found.In order to overcome the three problems mentioned above, i.e. context size, reconfiguration delay and reconfiguration power, four configuration techniques have been proposed in this paper: 1. HCC(Hierarchical Configuration Context) in top-level design of the configuration system; 2. RCM(Row-based Configuration Mechanism) for PEA(Processing Element Array); 3. 3DCT(3D Configuration Technique) for PEA; 4. LIBODM(Lifetime-Based On-chip Data Memory) and related data-transfer configuration contexts for on-chip data memories. First, in HCC, the context size of the whole configuration system is reduced using a hierarchical way. Thanks to HCC, the contexts size in H.264 decoding and symmetric cryptographic algorithms is reduced 76.67% and 82.8%~93.6% respectively. Compared with the high-percentage reconfiguration delay in XPP-III, the reconfiguration delay is only 4~13% of the overall runtime. Second, in RCM, PEA is configured by a row-by-row mechanism. RCM can reduce the reconfiguration delay of PEA. Meanwhile, the storage overhead for intermediate data and the configuration overhead for sub-DFGs are eliminated. Compared with the array-based configuration, in symmetric cryptographic algorithms, RCM can boost the performance 35.9%~42.4% higher, while the energy efficiency is boosted 16.8%~22.5% higher; in H.264 decoding, RCM can boost the performance 35.9% ~ 42.4% higher, while the energy efficiency is boosted 16.8%~22.5% higher. Third, with 3DCT, different kind of interconnection structures can be carried out on PEA conveniently. Meanwhile, 3DCT can reduce the reconfiguration power for those fully dynamic PEAs. Compared with the configuration in ADRES, the reconfiguration power is reduced 33.78% ~ 43.77% by 3DCT, while the total power consumption is reduced 11.83%~15.55%. Fourth, LIBODM is used to reduce the memory space for on-chip data. Compared with XPP-III and ADRES, the on-chip data memory space normalized by performance is only 23.8% and 14.8%.Four configuration techniques mentioned above have been used in two domain-specific RCPs and are being used in one RCP simulator oriented to general purpose applications. The multi-media RCP, i.e. REMUS_HPP, uses a 200 MHz working frequency to achieve 1920*1088 @30fps H.264 high-profile decoding, the energy efficiency is 15 x than that in XPP-III. The cryptographic RCP, i.e. REPROC, uses a 400 MHz working frequency to achieve a 51.2Gbps throughout in AES-128, the energy efficiency is 2 orders of magnitude higher than that in a multi-core general purpose processor.
Keywords/Search Tags:CGRA, context size, reconfiguration delay, reconfiguration power
PDF Full Text Request
Related items