Font Size: a A A

Researches On On-chip Parallel Data Access Techniques For SIMD DSPs With Very Wide Data Path

Posted on:2013-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:1268330392973872Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
As the evolution of the embedded applications and the progress of the chip designtechnology,theSingleInstructionstreamMultipleDatastreams(SIMD)technique,whichcan fully develop the parallelism of applications with lower hardware cost, has beenbroadly utilized in current DSPs. The data path width of the SIMD DSP has developedfrom4~8bytesto32~64bytes. TheVeryWideSIMDtechniquecanbringhighefficiencyto the system. Meanwhile, Very Wide SIMD DSPs also face lots of memory problems,such as the shortage of the real data bandwidth, too many memory conflicts, and the largecost of shuffle operations. So it is significant to study the efficient memory design of VeryWide SIMD DSPs.The on-chip Parallel Memory (PM) technique has the alluring prospect to the mostmemory access problems in Very Wide SIMD DSPs. This paper mainly focus on the on-chip PM issues in Very Wide SIMD DSPs, such as how to reduce its hardware cost (areaand power), how to give an unified PM framework to the general-propose DSP, how toreduce the conflicts within the PM and how to design an efficient shuffle unit to cooperatewith the PM and etc. The main contributions of this paper are summarized as follows:1). WeproposeaBilinearSkewedParallel2DMemory(BilisPM).Thenumberoftheneeded memory modules in BilisPM is the same as the SIMD width, and the width/depthofeachmemorymoduleisdoubled/halved. ThebilinearmappingfunctionmakesBilisPMsupportconflict-freeaccesstypes(row, column, blockandetc)andthecircularaddressingin the X and Y directions of the2D space. BilisPM can effectively reduce the area cost ofthe on-chip PM and its controller has smaller chip area and reasonable critical path delay.2). WeproposeaLow-Power2D(LP2D)memorybasedonthedatareuseofadjacentaccesses, according to the2D memory accessing features of the sliding-window applica-tions. By judging the correlation between the adjacent2D memory requests through theadjacent address checker, LP2D can create the bank control mask and turn off some bankaddress generators and bank access logics. LP2D can achieve an obvious reduction in thepower consumption with the reasonable hardware cost.3). We propose a Polymorphic Parallel Memory (PPM) scheme oriented applica-tion domains, according to memory access features of the communication and video pro-cessing algorithms. PPM can efficiently solve the irregular memory accesses by using the1D/2D configurable PM, the two-level cooperation scheme between the memory andthe register. PPM can provide an unified PM framework for high-performance general-propose DSPs by trade-offs between the hardware cost and efficiency. Experimental re-sults show that the PPM has the moderate hardware cost, and can effectively compressthe code size and improve the system performance.4). We propose the Vector DMA Cache (VDC) technique to reduce the conflictswithin the on-chip PM resulting from several clients. By combining the scattered DMArequests as a cache line request, VDC can effectively reduce the VM access counts fromthe DMA and decrease the VM access conflicts between the DMA and the Vector Pro-cessing Unit (VPU). VDC can efficiently relieve the conflict problem between the VPUand DMA to improve the system performance. Besides, the role of VDC will be moreevident as the SIMD width increases.5). We propose a programmable shuffle unit with the efficient shuffle mode memoryand introduce a novel shuffle matrix partitioning method: odd-even partition. By addingthe efficient shuffle mode memory and the responding shuffle instructions, the programscan efficiently execute since the shuffle operations cannot occupy the system’s key re-source: the general registers or the memory bandwidth. The odd-even partition solution,possessing a smaller data selecting span value and the stability of circular shift, has ad-vantages in dealing with the data going into or out of the Crossbar.Most of the above techniques have been or will be applied in our FT-Matrix seriesDSP. They can provide efficient examples for high-performance DSP designers.
Keywords/Search Tags:Very Wide SIMD, DSP, Parallel Memory, 2D Memory, Low-power, Vector DMA Cache, Shuffle Mode Memory, Odd-Even Partition
PDF Full Text Request
Related items