The Design And Implementation Of Vector Memory Unit Of Multi-Width SIMD DSP

Posted on:2013-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y G Huang

Full Text:PDF

GTID:2268330392473825

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Over the past decade, with the development of integrated circuits and computer technology,the performance growth rate of the CPU is nearly60%every year,but the performance ofmemory access has been only improved by7%[1].The “memory wall” problem caused bymemory bandwidth and latency has become the performance bottleneck that restricts themicroprocessor to further improve. The Digital Signal Processor(DSP) based on multi-widthSingle Instruction-stream Multiple Data-stream(SIMD) architecture works for high density dataprocessing. It integrates multiple Vector Processing Element(VPE) that need higher memoryaccess performance. How to provide sufficient memory bandwidth for the VPEs and how toreduce the data shuffle and other additional operations between the VPEs of mult-width SIMDDSP to improve the memory accessing efficiency and reduce power consumption have becomea important issue of designing a vector memory system.YHFT-Matrix DSP for Software Defined Radio(SDR) base station is independentlyresearched and developed by the Microelectronics and Microprocessor research institute ofNational University Defense of Technology. It adopts10issues Very Long InstructionWord(VLIW) and multi-width SIMD architecture. Its Vector Processing Unit(VPU) contains16vector process elements, each of which contains two multiply-add units and other ALU. So itrequires a higher data throughput and memory bandwidth in order to make full use of thecomputing power of VPU. So we design and implement a novel and large capacity on-chipvector memory(VM) to save a large amounts of data for VPU operating.The VM designs a special Vector Address Generation Unit(VAGU) which supports bothlinear and circular addressing. The total memory capacity of VM is1M bytes. Its memory isorganized by the Multiple-Bank interleaved-addressing of low-bit address(MIDB) with a DoubleBuffers architecture. It fulfils the demand of multiple vector data parallel memory accessing ofVPU with a small cost of lower area and power and reduces the parallel memory accessingconflicts. In order to accelerate the related communication algorithms, we also implement aVector Access Reorder Unit(VARU) and a Vector Write-back Reorder Unit(VWRU) in the VM.So the VM can support the non-aligned and conditional vector accessing memory. This methodcould make all the VPEs of VPU share the VM finitely and accessing VM conditionally. VM hasachieved the design target of supporting512Gbps vector data accessing,256Gbps DMA dataaccessing and32Gbps scalar data accessing performance. VM can sustain continuous vector byteand halfword accessing after later logic optimization.The YHFT-QMBase based four YHFT-Matrix DSP has been successfully tapeout now.After the logic verification and testing, the test result showed that the design function of theVM is correct. The processing frequency of the VM has reached up to500MHz or above. Thefrequency can reach700MHz after the logic optimization. The MIDB architecture of VM cansignificantly reduce the access conflicts. The way of finite sharing and vector conditionalaccessing memory can reduce or eliminate the shuffle operation of related algorithm,compress code density, and accelerate algorithm implementation.

Keywords/Search Tags:

SIMD, VM, Memory Conflict, VARU, VWRU, Finite Share, Shuffle, Vector Conditional Access

PDF Full Text Request

Related items

1	The Design And Verification Of 32bit High-Performance DSP SIMD Vector Memory
2	Researches On On-chip Parallel Data Access Techniques For SIMD DSPs With Very Wide Data Path
3	Research Of SIMD Vectorization Optimization Based On Memory Access
4	Design And Implementation Of64-bit SIMD BP Component And Shuffle Unit In X-DSP
5	The Design Of A 64-bit High-Performance DSP Parallel Memory Unit
6	Design And Implementation Of SIMD Unaligned Memory Access Structure
7	Compilation Optimization On Compiler Directive And Conditional Branch For SIMD
8	The Design And Optimization Of A Parallel Vector Memory
9	The Design And Implement Of Vector Memory To Support Gather/Scatter
10	Design And Implementation Of The Scalar And Vector Scratch Pad Memory On GX64-DSP Chip