Researches On On-chip Parallel Data Access Techniques For SIMD DSPs With Very Wide Data Path

Posted on:2013-03-04

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Liu

Full Text:PDF

GTID:1268330392973872

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

As the evolution of the embedded applications and the progress of the chip designtechnology,theSingleInstructionstreamMultipleDatastreams(SIMD)technique,whichcan fully develop the parallelism of applications with lower hardware cost, has beenbroadly utilized in current DSPs. The data path width of the SIMD DSP has developedfrom4～8bytesto32～64bytes. TheVeryWideSIMDtechniquecanbringhighefficiencyto the system. Meanwhile, Very Wide SIMD DSPs also face lots of memory problems,such as the shortage of the real data bandwidth, too many memory conflicts, and the largecost of shuffle operations. So it is significant to study the efficient memory design of VeryWide SIMD DSPs.The on-chip Parallel Memory (PM) technique has the alluring prospect to the mostmemory access problems in Very Wide SIMD DSPs. This paper mainly focus on the on-chip PM issues in Very Wide SIMD DSPs, such as how to reduce its hardware cost (areaand power), how to give an unified PM framework to the general-propose DSP, how toreduce the conflicts within the PM and how to design an efficient shuffle unit to cooperatewith the PM and etc. The main contributions of this paper are summarized as follows:1). WeproposeaBilinearSkewedParallel2DMemory(BilisPM).Thenumberoftheneeded memory modules in BilisPM is the same as the SIMD width, and the width/depthofeachmemorymoduleisdoubled/halved. ThebilinearmappingfunctionmakesBilisPMsupportconflict-freeaccesstypes(row, column, blockandetc)andthecircularaddressingin the X and Y directions of the2D space. BilisPM can effectively reduce the area cost ofthe on-chip PM and its controller has smaller chip area and reasonable critical path delay.2). WeproposeaLow-Power2D(LP2D)memorybasedonthedatareuseofadjacentaccesses, according to the2D memory accessing features of the sliding-window applica-tions. By judging the correlation between the adjacent2D memory requests through theadjacent address checker, LP2D can create the bank control mask and turn off some bankaddress generators and bank access logics. LP2D can achieve an obvious reduction in thepower consumption with the reasonable hardware cost.3). We propose a Polymorphic Parallel Memory (PPM) scheme oriented applica-tion domains, according to memory access features of the communication and video pro-cessing algorithms. PPM can efficiently solve the irregular memory accesses by using the1D/2D configurable PM, the two-level cooperation scheme between the memory andthe register. PPM can provide an unified PM framework for high-performance general-propose DSPs by trade-offs between the hardware cost and efficiency. Experimental re-sults show that the PPM has the moderate hardware cost, and can effectively compressthe code size and improve the system performance.4). We propose the Vector DMA Cache (VDC) technique to reduce the conflictswithin the on-chip PM resulting from several clients. By combining the scattered DMArequests as a cache line request, VDC can effectively reduce the VM access counts fromthe DMA and decrease the VM access conflicts between the DMA and the Vector Pro-cessing Unit (VPU). VDC can efficiently relieve the conflict problem between the VPUand DMA to improve the system performance. Besides, the role of VDC will be moreevident as the SIMD width increases.5). We propose a programmable shuffle unit with the efficient shuffle mode memoryand introduce a novel shuffle matrix partitioning method: odd-even partition. By addingthe efficient shuffle mode memory and the responding shuffle instructions, the programscan efficiently execute since the shuffle operations cannot occupy the system’s key re-source: the general registers or the memory bandwidth. The odd-even partition solution,possessing a smaller data selecting span value and the stability of circular shift, has ad-vantages in dealing with the data going into or out of the Crossbar.Most of the above techniques have been or will be applied in our FT-Matrix seriesDSP. They can provide efficient examples for high-performance DSP designers.

Keywords/Search Tags:

Very Wide SIMD, DSP, Parallel Memory, 2D Memory, Low-power, Vector DMA Cache, Shuffle Mode Memory, Odd-Even Partition

PDF Full Text Request

Related items

1	The Design And Implementation Of Vector Memory Unit Of Multi-Width SIMD DSP
2	The Design And Verification Of 32bit High-Performance DSP SIMD Vector Memory
3	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
4	Research Of In-Memory MapReduce System For Memory Efficiency Optimization
5	Memory Optimization On Chip Multi-core Processors
6	Design And Implementation Of SIMD Unaligned Memory Access Structure
7	Low-Power Cache Design Based On Non-Volatile Memory
8	Architectural Level Leakage Power Optimization For Cache Memory In Microprocessors
9	Optimizations Of Memory Subsystem For Chip Multiprocessor Systems
10	Research On NVM Based Main Memory Key Technology