Font Size: a A A

Research On Memory Optimization Algorithms For Reconfigurable Computing

Posted on:2019-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LuFull Text:PDF
GTID:2428330590951660Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
Coarse-Grained Reconfigurable Architecture(CGRA)is a promising reconfigurable computing platform with high computation performance,high power-efficiency and attraction of flexibility.The computation-intensive loops in various computer applications are often mapped onto CGRA for acceleration.CGRA can fully exploit instruction level parallelism and data level parallelism,which will cause large-scale parallel data access between the on-chip data memory and the computing array.Therefore,even if a data flow path is constructed successfully on CGRA for the target applications,the memory access conflicts will still cause pipeline stalls,resulting in severe degradation of performance.The architecture with multiple memory banks is proposed to ease the problem of massively concurrent data access.However,there is still a lack of effective loop mapping and data placement methods for the multi-bank CGRA.Firstly,this paper formally defines the problem of memory optimization for reconfigurable computing as a joint optimization problem,which includes both memory partitioning algorithm for eliminating access conflicts and modulo scheduling algorithm for minimizing Initial Interval(II).Secondly,an efficient memory partitioning algorithm is proposed for multiple single memory access patterns and multiple target arrays.We find a corresponding separation scheme for each target array and try to merge these separated arrays so that all the target arrays can be stored within the given finite memory banks.Then,a data management mechanism is proposed to hide data re-allocation overhead.The efficient memory partitioning features the periodicity of bank index allocation.We design a fixed-stride data prefetching method,which embeds data re-allocation in data prefetching.This mechanism guarantees no explicit data re-allocation time incurred in the loop mapping process proposed in this paper.Finally,a dual-force directed scheduling algorithm is designed for solving the joint optimization problem.Our method achieves the memory access pattern and operations distribution by iteratively adjusting the dualforce for the memory banks and the PE array,until a valid mapping can be found for a multi-bank CGRA,which includes a successful memory partitioning and a successful placement and routing scheme.The experimental results on the benchmarks of Livermore,Polybench and Mediabench show that our approach can improve the performance of loops on CGRA by 1.89ื,1.49ื and 1.37ื compared with REGIMap,HTDM and REGIMap+MP.Moreover,our approach can improve the utilization of computing resources on CGRA by 2.29ื,1.92ืand 1.2ื,respectively.Our approach is proved with good scalability,and can achieve equivalent or even higher computing power on a small-scale PE array,compared with the results achieved by other comparative technologies on a larger PE array.The compile time penalty caused by the proposed memory optimization algorithms is within an acceptable range.
Keywords/Search Tags:Reconfigurable Computing, Memory Optimization, Force-directed Scheduling, Memory Partitioning
PDF Full Text Request
Related items