Font Size: a A A

The Design Of A 64-bit High-Performance DSP Parallel Memory Unit

Posted on:2017-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:X T KongFull Text:PDF
GTID:2348330536967223Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of application requirements,SIMD architecture has become an important expansion of high-performance DSP.The SIMD DSP usually integrates a lot of parallel processing units,which work in single instruction multiple data stream mode in order to exploit effectively the data-level parallel property of the applications.But it also faces with some issues,such as limited memory bandwidth,parallel access conflicts,SIMD address alignment,which seriously affect the efficiency of memory access,and reduce the actual calculation performance of DSP.So research the efficient high-bandwidth parallel memory unit is important.X64-DSP is an inner core of a multi-core DSP which is developed independently by our project team and applies for high-performance computing.It adopts VLIW+SIMD technology and scalar-vector parallel processing architecture.Based on X64-DSP architecture,a parallel memory unit(PMU)is designed which supports scalar-vector parallel accessing in this paper,and is realized with low conflict and high-bandwidth data parallel access.The main work and innovations are as followings:(1)First,the peak computing data bandwidth requirements of X64-DSP's arithmetic units is analyzed,the important algorithm –GEMM's algorithm principle and block implementation method is studied,and the outline scheme of PMU is proposed.(2)A dedicated scalar-vector memory access instruction set for PMU is designed,which support linear addressing,Circular addressing,and multiple memory access granularity,improved the accessing flexibility.(3)PMU with local memory mode is realized,which supports the scalar-vector parallel access pipeline operation.The scalar memory unit of the PMU can access the scalar storage space and the configuration space inside and outside of the core,and can use the idle access bandwidth to provide data to support the vector computing of matrix multiplication algorithm.The vector memory unit of PMU can provide high-bandwidth SIMD data for the parallel processing units.The scalar memory and vector memory can support a dual buffer structure of data shifting,effectively hide DMA transmission delay,and improve the efficiency of parallel access.(4)The hierarchical verification technology including module-level and system-level verification is used for the design,and the results attain the coverage requirements.Based on C language,a system level stochastic verification case generation method is adopted to improve the coverage rate of system level functional verification;Two typical sizes GEMM application are used to test X64-DSP's performance,and the results show that the PMU can efficiently support GEMM to achieve a high computational efficiency.(5)We synthesize PMU based on 40 nm standard cell library and 500 ps clock period constraints by Synopsys' s Design Compiler synthesis tool,and the results show that the design meet the timing requirements.
Keywords/Search Tags:SIMD, parallel memory unit(PMU), access conflict unaligned access, GEMM
PDF Full Text Request
Related items