Research On Strong Real-time Schedule For CNN Inference Tasks On CPU-GPU Architecture

Posted on:2024-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Z Meng

Full Text:PDF

GTID:2568306923474674

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Real-time inference based on Convolutional Neural Network(CNN)has been a longstanding research problem in the field of Artificial Intelligence(AI).The General Purpose Graphics Processing Unit(GPGPU)is a common solution for accelerating CNN inference tasks due to its massively parallel computing capabilities.With the rapid growth and popularity of AI applications in recent years,increasingly complex AI applications require multiple different CNN inference models for different tasks to be run on a single device simultaneously.Especially in complex embedded systems with high real-time requirements such as unmanned vehicles,the system must ensure that latency-sensitive critical tasks are completed in a timely manner.However,to increase the hardware computing power of GPU will also increase the hardware cost and power consumption.Therefore,an important approach to solve this problem is to schedule multiple CNN inference tasks on a GPU in a time-division multiplexing manner and prioritize resource allocation for high-priority tasks.This ensures the response latency of critical tasks while keeping hardware conditions unchanged.Our main work is to solve the mentioned problem of strong real-time schedule for multiple CNN inference tasks on a single CPU-GPU architecture device.Existing GPU scheduling methods focus on predicting execution times of tasks and providing preemption mechanism support based on the hardware characteristics of specific devices,which makes them difficult to generalize across different GPUs or different underlying algorithm implementations.Therefore,in this paper,we propose a pure software-level schedule system.It models the performance and predicts the time of each basic mathematical operation in a CNN model based on the computational characteristics of the CNN inference model itself,and then slices a CNN inference task into time-slices with actual execution time close to a given value.After experiments,this approach achieves an average time estimation error of less than 5%and an average time-slice division error of less than 10%on four different types of GPU devices with different underlying algorithm library implementations.And it can guarantee real-time performance of critical tasks and reduce the response time of critical tasks to less than 10ms with GPU utilization over 98%.Compared with the existing methods,the innovation points of this paper are as follows.(1)In this paper,we innovatively use a time estimation model at the level of CNN base operations.An important feature of the model is that it is hardware independent.Compared to model-level or component-level estimation,this approach is more refined and more accurate.And compared to hardware-based estimation,this approach is more generic.(2)In this paper,we innovatively propose a pure software approach for slicing CNN inference tasks into time-slices.The slicing method is based on the software features of the CNN inference task itself,without relying on compiler or GPU driver support Therefore,it is adapted to different hardware devices.And the method provides a uniform time-slice scheduling interface,which can be adapted to a variety of time-slice based scheduling strategies.The proposed scheduling method is also validated in an unmanned vehicle system to ensure the real-time performance of safety-related critical tasks such as obstacle detection.

Keywords/Search Tags:

Convolutional Neural Network, Real-Time Scheduling, General Purpose Graphics Processing Unit, Time-Division Multiplexing, Performance Modeling

PDF Full Text Request

Related items

1	Research On Performance Optimization Of General Purpose Graphics Processing Unit Based On Thread Scheduling
2	Gpus General-purpose Computing Applications In Ct
3	Power Analysis And Optimization Of The General Purpose Computing Of Graphics Processing Unit
4	A novel methodology for calculating large numbers of symmetrical matrices on a graphics processing unit: Towards efficient, real-time hyperspectral image processing
5	Real-time Processing Of Key Technologies Based On High-performance General-purpose Dsp Radar Signal
6	Research On High-Performance Real-Time Optical Orthogonal Frequency-Division Multiplexing Communication System And Its Related Algorithms
7	A High-performance Sparse Convolutional Neural Network Based On GPU
8	Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming
9	Using hierarchical scheduling to support soft real-time applications in general-purpose operating systems
10	Slotted priorities: Supporting real-time computing within general-purpose operating systems