The Design And Optimization Of A Parallel Vector Memory

Posted on:2015-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:S L Yan

Full Text:PDF

GTID:2308330479979287

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The rapid development of modern communication technology and increase of media information have made greater demands on the performance of digital signal processor(DSP). Continuously improvement of high speed and real-time information processing has promoted the development of ultra-width SIMD. How to provide high-efficient and high-bandwidth memory access data for SIMD-based DSP with multi-way arithmetic units has become an important issue in design of the storage system. A high-bandwidth parallel vector memory access memory(AM) based on Matrix2 is designed and implemented in this paper. The Matrix2 is a high-performance DSP with 16-way SIMD arithmetic units. It’s also studied that how to improve the access performance of SIMD processors in particular applications. The main focus and innovations of this paper are reflected in the following aspects:1. Based on the Matrix2 instruction set architecture, design a set of multi-granularity vector access instructions which support half-word(4B), word(8B), double-word(16B). A set of special access instructions for accelerating FFT algorithm is also provided.2. Support two parallel vector memory instructions and the data bandwidth of each instruction is as high as 256B/Cycle. Double-accessing has provided sufficient data bandwidth to SIMD arithmetic units.3. Support non-aligned SIMD access. The implement of non-aligned SIMD access in both word size and double-word size has improved the efficiency and flexibility of the vector accessing.4. Support DMA parallel access, realize four-way-parallel(two vector memory instructions, DMA read and write) memory access at low collision rate. Using a arbitration mechanism with configurable priority and special memory organization to reduce the interference from DMA transmission to parallel access instructions and decrease the execution time of the programs.5. Design an extensible access pipelining scheduling controller with low hardware cost to ensure the accurate executing of SIMD instructions.6. Design an interface to achieve the matched bandwidth between AM and DMA, increase the bandwidth occupancy of AM.Finally, a module-level testbench based on System Verilog verification method is builded. The module-level verification of AM is realized. The results demonstrate that this new testbench can improve the efficiency of verification. And it’s also completed that the functional verification of AM in Matrix2 System-level verification environment. The total code coverage rate reaches 100%. System-level testing results have shown that the speed-up ratio of FFT operation can attain 1.29 to 2.26 at different points. Meanwhile, AM is synthesized and optimized based on 40 nm technology. The result has shown that the design meets all the performance requirements.

Keywords/Search Tags:

SIMD, Parallel Vector Access, Access Confliction, non-aligned access, FFT

PDF Full Text Request

Related items

1	Design And Implementation Of The Scalar And Vector Scratch Pad Memory On GX64-DSP Chip
2	Research Of SIMD Vectorization Optimization Based On Memory Access
3	The Design Of A 64-bit High-Performance DSP Parallel Memory Unit
4	Researches On On-chip Parallel Data Access Techniques For SIMD DSPs With Very Wide Data Path
5	The Design And Implement Of Vector Memory To Support Gather/Scatter
6	Design And Implementation Of SIMD Unaligned Memory Access Structure
7	The Design And Implementation Of Vector Memory Unit Of Multi-Width SIMD DSP
8	Research On Massive Access In Narrowband Internet Of Things
9	A Comparative Study Of Chinese And American Open Access Policies
10	Contention Random Access Procedure Implementation For The LTE-TDD