Font Size: a A A

Reseach On Key Technology In FPGA-ASIC Heterogeneous Computing System Toward Agile Hardware Design

Posted on:2024-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H MaFull Text:PDF
GTID:1528307097954329Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
As the technology develops rapidly,higher computing performance is needed for applications and algorithms.A lot of new algorithms,perform better than before in many tasks.Moore’s Law is going to become obsolete.Thus,it’s hard to improve the performance of generalpurpose processors.More customed systems are employed for performance demand.However,customed systems are designed with high complexity and poor scalability.A lot of concerns are necessary for system design,including:hardware and software interface,network architecture,etc.The complexity of these units has adverse effects of design efficiency.To address these problems,this thesis investigates the design methods of key subsystems in FPGA-ASIC based heterogeneous accelerating systems and proposes a heterogeneous system platform for agile design.The platform mainly includes:1.Low-latency hierarchical On-chip/Off-chip Networks.For the performance demand of complicated systems,we design a hierarchical network based on ring NoC.Without modifying the communication mechanism,the network could be simply extended on chip or between chips.To improve performance,a bypass channel is attached to each router.Therefore,a packet can be sent over multiple nodes in one cycle.The network latency are reduced by 50.15%in ring topology and 33%in mesh topology.2.A memory pool based low-latency memory architecture with near memory computing unit.Memory sub-system is one of the core component in digital systems.To improve the memory access performance,we proposed a low-latency memory architecture that is based on memory pool.A memory pool consists of several physical memory channels.And it connects to different nodes in the system through virtual memory channels.The connect can be customized according to the data flow improving memory access performance.Besides,a near memory processor(NMP)is attached to each physical memory channel.Data intensive tasks could be accelerated by design a custom processing element in the NMP.The power is reduced by 40%to compute fully connected layers in CNN with the help of NMP.3.Universal hardware interface for accelerators and efficiency controller.Hardware and software interface design for accelerators is one of the key points of system design.It also affects the design efficiency.We proposed a hardware interface with resource abstraction and a softwarebased task controller.Accelerator designer can use the standard cell provided by the platform to improve design efficiency.For the power consideration,an efficient processor with variable architecture is also proposed.In low-power mode,the processor could save more than 50%energy in the same task.4.Heterogeneous CNN accelerating architecture.An agile design methodology is discussed in this thesis,and three processing element is proposed for input layer,middle layer and fullconnection layer convolution in CNN algorithm.Thesis processing element are implemented on FPGA and ASIC according to their performance requirements.The FPGA based input layer convolution accelerator could improve more than 15%computing efficiency and 44%energy efficiency.The peak performance of ASIC based middle layer convolution accelerator is 6.4 TOPS with 3.77 TOPS/W energy efficiency.In this thesis,the key technologies of ASIC-FPGA heterogeneous computing system have been studied.And we also discussed the factors that affecting the design efficiency and system performance.We also design a FPGA-ASIC heterogeneous computing system for CNN accelerating.This will be beneficial for heterogeneous hardware agile design.The future work may focus on imporving the performance of network and memory sub-system and building the system with high-level languages.
Keywords/Search Tags:heterogeneous computing, hierarchical network-on-chip, near memory computing, variable architecture processor, CNN accelerator
PDF Full Text Request
Related items