Font Size: a A A

Design And Implementation Of RISC-Ⅴ Vector Coprocessor For Edge Computing

Posted on:2023-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:J Y SuFull Text:PDF
GTID:2558306908454444Subject:Integrated circuit system design
Abstract/Summary:PDF Full Text Request
With the rapid development of IoT technology,edge computing and artificial intelligence,embedded processors at the network edge need to take on more and more computation-intensive applications,such as image processing,encryption and decryption algorithms and CNN data inference,while conventional processors often perform poorly in such application scenarios,so they need to resort to hardware algorithm acceleration.The use of dedicated acceleration chips,FPGAs or SoC acceleration peripheral devices can effectively improve processing efficiency,but there is also the problem of the lack of diversity of application scenario and high cost of hardware customization.Therefore,by adding a vector architecture with single instruction and multiple data to the embedded processor,it is possible to realize hardware acceleration for a variety of edge computing applications with the same hardware architecture,which is of great value and significance to reduce the application cost of embedded processors and improve the processing performance of embedded processors facing computation-intensive applications.In this thesis,a vector coprocessor architecture supporting dynamic scheduling based on RISC-Ⅴ vector instruction set extensions is proposed,which combines the design ideas of instruction-level parallelism and data-level parallelism to provide hardware acceleration for embedded processors in edge computing scenarios.In the dynamic scheduling architecture of this design,the issue queue issues and executes the instructions that are ready out of order to reduce the performance loss caused by RAW data hazard blocking the pipeline;the register renaming mechanism eliminates WAW and WAR data hazard;and the reordering queue ensures the correctness of program results by giving way to the sequential submission of instructions executed out of order.The proposed design implements four types of instruction:configuration,vector memory access,vector integer and vector floating-point operation.The configuration instruction can configure the vector element width,length and grouping of vectors by modifying the control status register to suit different algorithmic applications.The memory access unit implements three access modes:unit-stride addressing,strided addressing and index addressing,optimizes unit-stride addressing for address aligned to 128-bit,and provides data bypass to speed up the access speed when LAS data hazard occurs.The integer execution unit implements several types of instruction functions,including addition,subtraction,comparison,logical operation,selection,shift,fixed-point and reduction,and reduces the circuit area by reusing the calculation circuit.The floating-point execution unit implements the instruction functions of addition,multiplication,fused multiplication-addition and selection,and optimizes the critical path of the fused multiplication-addition unit using the register retiming annotation in Chisel.In this thesis,the proposed vector co-processor functionality is simulated and verified on the VCS software simulation platform.The correctness of the instruction functionality implemented in this design is verified through specific instruction functional verification test cases and the robustness of the overall architecture including the dynamic scheduling structure is verified using random instruction stress tests.In order to verify the acceleration effect of the design on edge computing applications,the design is tested on the FPGA verification platform from both integer and floating-point aspects,and the integer test program vvadd is implemented in both vector Intrinsic and inline assembly,and the average acceleration ratios of 1.45 and 3.42 are obtained compared with the scalar pipeline execution time,respectively;The floating-point test case is implemented in inline assembly to vectorize convolution,pooling and full connectivity for neural network applications,obtaining a speedup ratio of 4.25 and a 29.7%reduction in code size compared to its scalar implementation.Finally,the design is synthesized under TSMC 22nm process library,the critical timing paths are optimized according to the synthesize report,and the results show that the vector coprocessor proposed in this thesis can run at up to 917 MHz with an area of 180841 um~2,and the proposed design passes formal verification.
Keywords/Search Tags:RISC-Ⅴ, embedded processor, vector architecture, dynamic scheduling
PDF Full Text Request
Related items