Font Size: a A A

The Design And Optimization Of A Parallel Vector Memory

Posted on:2015-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:S L YanFull Text:PDF
GTID:2308330479979287Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of modern communication technology and increase of media information have made greater demands on the performance of digital signal processor(DSP). Continuously improvement of high speed and real-time information processing has promoted the development of ultra-width SIMD. How to provide high-efficient and high-bandwidth memory access data for SIMD-based DSP with multi-way arithmetic units has become an important issue in design of the storage system. A high-bandwidth parallel vector memory access memory(AM) based on Matrix2 is designed and implemented in this paper. The Matrix2 is a high-performance DSP with 16-way SIMD arithmetic units. It’s also studied that how to improve the access performance of SIMD processors in particular applications. The main focus and innovations of this paper are reflected in the following aspects:1. Based on the Matrix2 instruction set architecture, design a set of multi-granularity vector access instructions which support half-word(4B), word(8B), double-word(16B). A set of special access instructions for accelerating FFT algorithm is also provided.2. Support two parallel vector memory instructions and the data bandwidth of each instruction is as high as 256B/Cycle. Double-accessing has provided sufficient data bandwidth to SIMD arithmetic units.3. Support non-aligned SIMD access. The implement of non-aligned SIMD access in both word size and double-word size has improved the efficiency and flexibility of the vector accessing.4. Support DMA parallel access, realize four-way-parallel(two vector memory instructions, DMA read and write) memory access at low collision rate. Using a arbitration mechanism with configurable priority and special memory organization to reduce the interference from DMA transmission to parallel access instructions and decrease the execution time of the programs.5. Design an extensible access pipelining scheduling controller with low hardware cost to ensure the accurate executing of SIMD instructions.6. Design an interface to achieve the matched bandwidth between AM and DMA, increase the bandwidth occupancy of AM.Finally, a module-level testbench based on System Verilog verification method is builded. The module-level verification of AM is realized. The results demonstrate that this new testbench can improve the efficiency of verification. And it’s also completed that the functional verification of AM in Matrix2 System-level verification environment. The total code coverage rate reaches 100%. System-level testing results have shown that the speed-up ratio of FFT operation can attain 1.29 to 2.26 at different points. Meanwhile, AM is synthesized and optimized based on 40 nm technology. The result has shown that the design meets all the performance requirements.
Keywords/Search Tags:SIMD, Parallel Vector Access, Access Confliction, non-aligned access, FFT
PDF Full Text Request
Related items