Font Size: a A A

Research On Real-Time Deep Vision Computing Based On Reuse And Video Coding

Posted on:2024-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:X K WeiFull Text:PDF
GTID:2568307064485414Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Artificial Intelligence(AI)applications are booming in areas such as real-time video analytics,cognitive assistance,and the Internet of Vehicles.These applications require visual computing to provide high-quality mobile services.Such services are typically executed by computationally complex deep Convolutional Neural Networks(CNNs).However,it is particularly difficult to execute full-size CNNs networks on mobile devices with limited memory and computing capacity.In order to meet the growing demand for real-time human-device interaction,low-latency solutions have to be continuously available.Currently,there are two types of solutions dedicated to providing low-latency services.One solution is to reduce the amount of computation by streamlining the network structure directly on mobile devices.However,the model’s simplicity often lowers the inference performance,and the computation of the most streamlined network is still heavy for mobile devices.Another solution is to offload the computation to the edge with more computing capacity.In this case,the total latency consists of inference latency and communication latency.To reduce the communication latency,video compression coding techniques are often used to exploit the spatio-temporal similarity of continuous video in order to reduce the transmission of similar image regions.However,low latency is still difficult to achieve when transmitting complete video data in low bandwidth environments.To reduce the inference latency,a common approach is to use reuse techniques,which trade a small loss of accuracy for a large time gain by directly reading the results of similar input.However,this approach introduces additional computational overhead.To address the problems of the offloading scheme,we find that similar image regions are already indicated in the offloaded video coding,and reusing the intermediate or final computation results of these regions directly can save inference time without repeated input matching.At the same time,because the data of similar regions do not need to be computed,the transmitted data can be further simplified to save communication time.Therefore,in this paper,we reduce the end-to-end latency by exploiting the rich information in video compression coding to efficiently recode the video data and reuse the same computation or results.Specifically,the work is as follows:1.In order to reduce the inference latency by reusing without repeated input matching,we design the Single-Pass Clustering and Maximum Rectangle Search Algorithm(SPSA)to cluster similar macroblocks according to similar matching of video encoding,tolerate and distinguish interference,and cut them into the largest regions to increase the amount of computation that can be reused in the CNN layer on the edge.2.For the possibility that continuous video may not change significantly,in order to reduce repetitive transmission and inference,we design an offloading decision that integrates task characteristics and current frame characteristics so that video frames without significant changes can directly reuse the computation results of the previous frame on the mobile device,skipping the transmission and computation of this frame.3.In order to further compress the transmitted data during offloading and to avoid over-compressed coding from affecting the final computational results,a Rematching and Partial Ignoring of Residuals(RePI)strategy is designed.Based on the base video compression coding,the RePI strategy guides the final coding by controlling a neglect residual threshold that weighs the accuracy loss against the time gain.4.Since there are numerous parameters and thresholds for video coding and reuse computation,it is difficult to adapt to task and scene changes by manually tuning the parameters one by one.In this paper,a simple adaptive tuning optimization algorithm is designed to adjust the coding parameters quickly and continuously with heuristics through a feedback mechanism,a control strategy,an execution strategy and a dynamic parameter table,in order to achieve controlled accuracy loss and latency reduction.We integrate the SPSA algorithm for reusing region identification,the decision for deciding whether offloading is required,the RePI strategy for simplifying the transferred data into a HRCache prototype.Meanwhile,the parameters and thresholds of the prototype are adjusted with an adaptive tuning optimization algorithm,enabling the prototype to reduce both communication latency and inference latency with a manageable loss of accuracy.Compared with the full offloading scheme,it significantly reduces the average latency,about 13.60% to 18.83%,at little accuracy loss of 1.25% in Top-1 accuracy for classification and 0.135 in Io U for object detection.
Keywords/Search Tags:Edge Computing, Mobile Deep Vision, Reuse, Re-compression, Parameter Tuning
PDF Full Text Request
Related items