Font Size: a A A

Research Of Loop Pipelining Compilation Technique For Heterogeneous Fine-grained Reconfigurable Systems

Posted on:2012-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2218330368982447Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
High performance computing is the hotspot in the industry as more desirable performance is expected from application's speed. FPGA as the engine of reconfigurable computing, can translates the inherent parallelism of reconfigurable logic devices into Supercomputing ability. High computing performance was got using the FPGA's high flexibility and the advantages of parallel computing, so heterogeneous reconfigurable systems based on FPGA is gradually becoming an important trend in HPC. However, reconfigurable computing lacks fully automatic compiling tools, the defect of manual method design is becoming more and more difficult to endure as the algorithm complexity of application program increases. C2VHDL Compiling technology builds a bridge for this gap which between the high-performance computing and design process, that's why it has become a hotspot of recent academic research.Exploring the parallelism of loops efficiently is the key point for improving the whole system's executive performance, because the loops costs almost the whole running time of applications. Until now, many researchers at home and abroad do extensive research in the field of loop pipelining (LPp), for it can improve the whole system's executive performance, but there still exists lots of defect:not support loop carried dependence; cost too much resources on-chip when improving speed; ignore the design of memory system, and so on.Aiming at these defects, a high-bandwidth loop pipelining compilation technique was proposed in this paper for const-step loops. Combined with object architecture, loops was translated into high-level parallel computing model written by VHDL. LPp IP core's control, memory and computing modules were generated dependently. When generate control module, base on the array data dependence analysis, this paper propose a variable initiation interval-iterative modulo scheduling algorithm (Ⅶ_IMS), which can improve the throughout remarkbalely when deal with the loops which has severe loop-carried dependence. Control module generated base on this algorithm can support break control structures. When generate the computing module, this paper proposes a better partially compacted pipeline patition techonology, the small-delay pipeline stage was compacted on the premise of not changing the hardware's max frequency. In the computing module's RTL level file, process is consisted of the instruction of the same stage. When generate the memory module, this paper propose a self-adapt memory algorithm according to application's inherent characteristics of memory access to generate a parameterized parallel memory architecture model. This memory model supports data reuse of input dependence, data reuse of flow dependence, and the data access parallelly. If the pipeline parallelism is 8, the loop pipeline's throughput is increased by a factor of 7 to 12 using this memory architecture.Efficient solutions was proposed to the problems in LPp research, experimental results show that high-bandwidth loop pipelining technique this paper proposed can get good pipelining performance and accelerate speed.this paper has academic value for advancing the theory of LPp technology research.
Keywords/Search Tags:heterogeneous reconfigurable, C2VHDL, high-bandwidth LPp, VⅡ_IMS, self -adapt memory
PDF Full Text Request
Related items