Font Size: a A A

Towards The Design Of High-Performance And Low-Cost Heterogeneous Pipelining Systems

Posted on:2020-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:W W JiangFull Text:PDF
GTID:1368330599453390Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the ever-growing development of intelligent computing,such as compute vision,speech recognition,natural language processing,in addition that the applications become more complicated,the demand on computational energy efficiency(i.e.,the computational performance of unit energy consumption)has also been increasing sharply.With such a vision,how to design the high computational energy-efficient system with high parallelism for the Artificial Intelligence(AI)applications attracts the attention of worldwide researchers.The heterogeneous pipelining architecture becomes one of the most promising solutions for such systems.Take Deep Neural Network(DNN)as an example,with the increasing depth in the application,the pipelining design demonstrates its strength to significantly improve performance;meanwhile,due to the distinct different computation requirements among DNN layers,the heterogeneous platform can provide tailored designs for DNN layers to maximize the energy efficiency.Although there has already a large number of Neural Architecture Search(NAS)on the application level,and Hardware Designs Search(HDS)for heterogeneous accelerators,they independently explore NAS and HDS spaces which limits the efficiency of the overall system.To fill the gap,this paper focuses on software and hardware co-design.We study the collaborate design and optimization for different applications and hardware platforms.Unlike the existing work,we will thoroughly explore the software and hardware design spaces to exploit the parallelisms in the application and maximize the hardware utilization.Techniques developed in this paper will identify the optimal pipelining systems for three different kinds of systems:(1)the cluster of Field-Programmable Gate Array(FPGA),(2)the general purpose computing resource based Multi-Processor System-on-Chip(MPSoC),and(3)large-scale distributed self-timed systems.The major contributions of this thesis are listed as follows:(1)We take Convolutional Neural Networks(CNNs)and Field-Programmable Gate Array(FPGA)as a vehicle to study the heterogeneous pipelining system designs for the tasks with fixed execution time.We first propose to deploy CNNs to heterogeneous FPGAs,and devise dynamic programming based algorithms to partition CNNs onto FPGAs.Then,we optimize each CNN layers to exploit the fine-grained parallelism.By partitioning CNN layers into parallel execution blocks and balancing the computation and communication among the N FPGAs,we can achieve N times of performance improvement,achieving super-linear performance.On top of this,we open the NAS space in the design phase,and co-explore NAS and HDS spaces to identify the best neural architectures together with their tailored FPGA implementations.(2)For MPSoCs,we study the uncertainties in such platforms and develop techniques to design heterogeneous pipeline MPSoCs.We first propose the probabilistic pipeline model,which describes the execution time of tasks using random variables.Targeting on different structures in an application,we devise the dynamic programming and Pareto frontiers based algorithms to find the optimal solution for simple structures(i.e.path and tree);then,we further combine dynamic programming with linear programming to(1+?)-approximation algorithm.The identified systems can satisfy the timing performance requirement under the required guaranteed probability while achieving the minimum total cost.(3)Finally,for large-scale distributed systems,we incorporate asynchronous communication among pipeline states.We study the fundamental problems,including how to model the run-time behavior,how to detect deadlocks in the system,and how to avoid deadlocks.Based on the knowledge,we devise efficient dynamic programming algorithms to synthesize heterogeneous pipeline asynchronous systems.The proposed frameworks and algorithms are evaluated.Experimental results verify the effectiveness and efficiency of the proposed techniques.For the NP-Hard optimization problems,our proposed algorithms can find optimal/near-optimal solutions in pseudo-polynomial time.Compared with integer linear programming approach,we can find the same optimal solutions with less than 10,000 times elapsed time.While comparing with the state-of-the-art heuristic algorithms,we find in many cases,they cannot find any feasible solutions,but our techniques can find the optimal solutions.We believe that the proposed frameworks and optimization algorithms can support for the imminent emergence of a large number of AI applications to be deployed to different platforms,including the embedded system,cloud computing platforms,and IoT devices,and promote the development of intelligent computing technologies.
Keywords/Search Tags:Heterogeneous Pipelining System, Real-Time System, Deep Neural Network, Optimization Algorithm Design
PDF Full Text Request
Related items