The Orchestration Of Instruction Issuing In Data Parallel Processors

Posted on:2014-05-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y H Wang

Full Text:PDF

GTID:1228330422974080

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

The performance of microprocessors has grown1,000-fold over the past20years,driven by VLSI technology and architecture advances. However, there is still growinggap between the performance of microprocessors and the demand in applications. Thesituation is getting worse due to the diminished improvement in VLSI technology andthe tight power envelop. Fortunately, the emergence of data parallel processors (DPP),which combine the best attributes of multi-core, SIMD and VLIW schemes, have broughtsome sunshine to the microprocessor industry. However, we have to notice that there arestill problems inside DPPs, if solved, the performance of microprocessor can be greatlyimproved under reasonable power consumption. This paper focused on the critical prob-lems in DPPs, and tried to solve these problems from the aspect of instruction issuingorchestration. We have enforced the orchestration of instruction issuing in DPPs formtwo directions: one is the power conscious performance model, which provides valuableinsights in the combination of the multi-core, SIMD and VLIW issuing schemes; the oth-ers are efficient architectural schemes, which overcome the bottleneck effect and achieveefficient cooperation of abundant hardware resources in DPPs. The main contributions ofthis paper are as follows:1). We have build up the power conscious performance model for efficient combi-nations of instruction issuing schemes like multi-core, SIMD and VLIW. The model isbased on the rational of Amdahl’s law, and provides valuable insights for DPP design.This helps architecture designers to determine proper parameters including core number,SIMD width and VLIW length. Critical bottleneck effect in DPPs are also locked by theperformance model.2). We present the dual-core framework, which can efficiently eliminate the bot-tleneck effect of scalar processing, and provide efficient cooperation between controland computation. There are three main components of the dual-core framework, whichare kernel level pipeline, dynamic couple/decouple, and coherent branching. The kernellevel pipeline scheme exhibits a large amount of parallelism between scalar and parallelprocessing, and then, the parallelism is well exploited by the dynamic couple/decouplemechanism. The coherent branching and shared register schemes further improves theperformance of dual-core framework when running in the tightly coupled mode. 3). Weproposetheinstructionshufflescheme,whichcanefficientlysolvethebranchdivergence problem. The instruction shuffle scheme can let different branch paths beingexecuted concurrently in multiple SIMD lanes, and achieves a MIMD-like performance,while maintaining the efficiency of SIMD architectures. The instruction shuffle schemeserves as a bridge between SIMD and MIMD architectures.4). We refine the instruction shuffle scheme with the multiple SIMD multiple data(MSMD) architecture. The MSMD architecture can not only support branch structures,butalsotheconcurrentmultiplekernelswithdifferentSIMDwidth. Moreover,theMSMDarchitecture upgrades the instruction buffer scheduling heuristic of the instruction shufflescheme, which can greatly improve the overall performance.5). We put together both the dual-core framework and the MSMD architecture intoan orchestral instruction-issuing scheme. The orchestral instruction-issuing scheme cansystematically solve the above problems including scalar processing, branch problemsand the concurrent multiple SIMD. We have also evaluated the orchestral instruction-issuing scheme in the RTL level environment of a home-made full core implementation.The evaluation result shows that the orchestral instruction-issuing scheme can achieveefficient cooperation of hardware resources with reasonable cost.Data parallel processor is still a hot topic. Many critical problems still need bothsystematic and practical studies. In this paper, we first carry out an insightful perfor-mance model, and then based on the model, we proposed efficient architectural schemesto eliminate critical bottleneck effects. The evaluation results show that these schemesare practical and can be used in future data parallel processors.

Keywords/Search Tags:

DPP, Multi-core, SIMD, VLIW, Instruction Shuffle, Dual-coreFramework, MSMD

PDF Full Text Request

Related items

1	Instruction-flow Scheduling Mechanism For High-performance SIMD DSP
2	Research On Vectorization Technology For Multi-cluster And VLIW DSP
3	Design And Implementation Of64-bit SIMD BP Component And Shuffle Unit In X-DSP
4	Design and analysis of time-predictable single-core and multi-core processors
5	ILP-SIMD: An instruction parallel SIMD architecture with short -wire interconnects
6	Design Of Configurable And Extensible Media Processor
7	Design And Implementation Of The Instruction Fetch Unit And Multiple Instruction Flows Extension In The YHFT-Matrix DSP
8	The Simd Compiler Optimization Methods Research
9	Investigation On Basic Block Scheduling Optimization For Predicate Execution VLIW DSP
10	The Design And Implementation Of Vector Memory Unit Of Multi-Width SIMD DSP