Font Size: a A A

Research On Strong Real-time Schedule For CNN Inference Tasks On CPU-GPU Architecture

Posted on:2024-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:C Z MengFull Text:PDF
GTID:2568306923474674Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Real-time inference based on Convolutional Neural Network(CNN)has been a longstanding research problem in the field of Artificial Intelligence(AI).The General Purpose Graphics Processing Unit(GPGPU)is a common solution for accelerating CNN inference tasks due to its massively parallel computing capabilities.With the rapid growth and popularity of AI applications in recent years,increasingly complex AI applications require multiple different CNN inference models for different tasks to be run on a single device simultaneously.Especially in complex embedded systems with high real-time requirements such as unmanned vehicles,the system must ensure that latency-sensitive critical tasks are completed in a timely manner.However,to increase the hardware computing power of GPU will also increase the hardware cost and power consumption.Therefore,an important approach to solve this problem is to schedule multiple CNN inference tasks on a GPU in a time-division multiplexing manner and prioritize resource allocation for high-priority tasks.This ensures the response latency of critical tasks while keeping hardware conditions unchanged.Our main work is to solve the mentioned problem of strong real-time schedule for multiple CNN inference tasks on a single CPU-GPU architecture device.Existing GPU scheduling methods focus on predicting execution times of tasks and providing preemption mechanism support based on the hardware characteristics of specific devices,which makes them difficult to generalize across different GPUs or different underlying algorithm implementations.Therefore,in this paper,we propose a pure software-level schedule system.It models the performance and predicts the time of each basic mathematical operation in a CNN model based on the computational characteristics of the CNN inference model itself,and then slices a CNN inference task into time-slices with actual execution time close to a given value.After experiments,this approach achieves an average time estimation error of less than 5%and an average time-slice division error of less than 10%on four different types of GPU devices with different underlying algorithm library implementations.And it can guarantee real-time performance of critical tasks and reduce the response time of critical tasks to less than 10ms with GPU utilization over 98%.Compared with the existing methods,the innovation points of this paper are as follows.(1)In this paper,we innovatively use a time estimation model at the level of CNN base operations.An important feature of the model is that it is hardware independent.Compared to model-level or component-level estimation,this approach is more refined and more accurate.And compared to hardware-based estimation,this approach is more generic.(2)In this paper,we innovatively propose a pure software approach for slicing CNN inference tasks into time-slices.The slicing method is based on the software features of the CNN inference task itself,without relying on compiler or GPU driver support Therefore,it is adapted to different hardware devices.And the method provides a uniform time-slice scheduling interface,which can be adapted to a variety of time-slice based scheduling strategies.The proposed scheduling method is also validated in an unmanned vehicle system to ensure the real-time performance of safety-related critical tasks such as obstacle detection.
Keywords/Search Tags:Convolutional Neural Network, Real-Time Scheduling, General Purpose Graphics Processing Unit, Time-Division Multiplexing, Performance Modeling
PDF Full Text Request
Related items