Research On Real-Time Deep Vision Computing Based On Reuse And Video Coding

Posted on:2024-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:X K Wei

Full Text:PDF

GTID:2568307064485414

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Artificial Intelligence(AI)applications are booming in areas such as real-time video analytics,cognitive assistance,and the Internet of Vehicles.These applications require visual computing to provide high-quality mobile services.Such services are typically executed by computationally complex deep Convolutional Neural Networks(CNNs).However,it is particularly difficult to execute full-size CNNs networks on mobile devices with limited memory and computing capacity.In order to meet the growing demand for real-time human-device interaction,low-latency solutions have to be continuously available.Currently,there are two types of solutions dedicated to providing low-latency services.One solution is to reduce the amount of computation by streamlining the network structure directly on mobile devices.However,the model’s simplicity often lowers the inference performance,and the computation of the most streamlined network is still heavy for mobile devices.Another solution is to offload the computation to the edge with more computing capacity.In this case,the total latency consists of inference latency and communication latency.To reduce the communication latency,video compression coding techniques are often used to exploit the spatio-temporal similarity of continuous video in order to reduce the transmission of similar image regions.However,low latency is still difficult to achieve when transmitting complete video data in low bandwidth environments.To reduce the inference latency,a common approach is to use reuse techniques,which trade a small loss of accuracy for a large time gain by directly reading the results of similar input.However,this approach introduces additional computational overhead.To address the problems of the offloading scheme,we find that similar image regions are already indicated in the offloaded video coding,and reusing the intermediate or final computation results of these regions directly can save inference time without repeated input matching.At the same time,because the data of similar regions do not need to be computed,the transmitted data can be further simplified to save communication time.Therefore,in this paper,we reduce the end-to-end latency by exploiting the rich information in video compression coding to efficiently recode the video data and reuse the same computation or results.Specifically,the work is as follows:1.In order to reduce the inference latency by reusing without repeated input matching,we design the Single-Pass Clustering and Maximum Rectangle Search Algorithm(SPSA)to cluster similar macroblocks according to similar matching of video encoding,tolerate and distinguish interference,and cut them into the largest regions to increase the amount of computation that can be reused in the CNN layer on the edge.2.For the possibility that continuous video may not change significantly,in order to reduce repetitive transmission and inference,we design an offloading decision that integrates task characteristics and current frame characteristics so that video frames without significant changes can directly reuse the computation results of the previous frame on the mobile device,skipping the transmission and computation of this frame.3.In order to further compress the transmitted data during offloading and to avoid over-compressed coding from affecting the final computational results,a Rematching and Partial Ignoring of Residuals(RePI)strategy is designed.Based on the base video compression coding,the RePI strategy guides the final coding by controlling a neglect residual threshold that weighs the accuracy loss against the time gain.4.Since there are numerous parameters and thresholds for video coding and reuse computation,it is difficult to adapt to task and scene changes by manually tuning the parameters one by one.In this paper,a simple adaptive tuning optimization algorithm is designed to adjust the coding parameters quickly and continuously with heuristics through a feedback mechanism,a control strategy,an execution strategy and a dynamic parameter table,in order to achieve controlled accuracy loss and latency reduction.We integrate the SPSA algorithm for reusing region identification,the decision for deciding whether offloading is required,the RePI strategy for simplifying the transferred data into a HRCache prototype.Meanwhile,the parameters and thresholds of the prototype are adjusted with an adaptive tuning optimization algorithm,enabling the prototype to reduce both communication latency and inference latency with a manageable loss of accuracy.Compared with the full offloading scheme,it significantly reduces the average latency,about 13.60% to 18.83%,at little accuracy loss of 1.25% in Top-1 accuracy for classification and 0.135 in Io U for object detection.

Keywords/Search Tags:

Edge Computing, Mobile Deep Vision, Reuse, Re-compression, Parameter Tuning

PDF Full Text Request

Related items

1	The Research And Implementation Of Distributed Deep Learning Optimization Methods For Edge Computing
2	Research On Personalized Models And Efficiency Of Federated Learning Based On Mobile Edge Computing
3	Research On Resource Allocation Method Of Mobile Edge Computing Based On Intelligent Inference
4	Research On Neural Network Compression Method For Edge Computing Platform
5	Research On Collaborative Inference Optimization Technology For DNN In Edge Computing
6	Research On Resource Sharing Incentive Mechanism In D2D-Assisted Mobile Edge Computing
7	Research On Resource Management Problems In Mobile Edge Computing
8	Research On Edge Cache And Service Migration In Mobile Edge Computing Based On Software Defined Networks
9	Research On Key Technologies Of Task Scheduling For Mobile Edge Computing
10	Research On Wireless Resource Management Of Mobile Edge Computing In B5G Networks