Font Size: a A A

Research On Key Technologies Of Video Coding System For Human Machine Vision

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2518306311976339Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and multimedia technology,video is widely used in life,and has gradually become the most important form of information expression.Because of the increasing amount of video data and limited network bandwidth resources,video coding is needed for efficient video transmission.Traditional video coding methods use sum of square error as the distortion measure of rate distortion optimization,aiming at improving the objective quality of reconstructed video under a limited bitrate constraint.With the wide application of the Internet of things and 5G technology,the application scenarios of video are gradually changing.More and more videos are used by machines for a variety of computer vision tasks,and video semantic segmentation is one of them.Therefore,video coding for machine vision and human-computer hybrid vision has become a research focus.Video coding for machines(VCM)technology emerges as the times require.In this paper,the major technologies of video coding system are studied.The main work and contribution are as follows.1.The key technologies of video coding system for human-computer vision are investigated,including semantic segmentation,video coding standard and rate distortion optimization.This paper summarizes some technologies of algorithms,and analyzes their merits and demerits,which provides a comprehensive theoretical basis for the research.2.For video semantic segmentation,a convolutional neural network based on ConvLSTM is proposed.The network can capture the correlation between adjacent frames and guide the prediction of the next frame through ConvLSTM.The backbone uses dilated convolution,pyramid pooling,dense connection blocks and other structures to enlarge the receptive field of the network.To avoid over fitting in the process of network training,the strategy of data augmentation and learning rate attenuation is adopted.Experimental results indicate that the proposed network is able to fully utilize the timing information between consecutive frames and enhance the correctness of segmentation results,especially for the dynamic objects and small objects in videos.3.In video coding system for human-computer vision,the goal of coding is to ensure the semantic quality and objective quality of decoded video under a certain code rate.Therefore,for the scene of video semantic segmentation,to ensure the semantic segmentation accuracy and fidelity of reconstructed video at the same time,we propose a rate distortion optimization algorithm based on hybrid distortion measure.First,a semantic distortion matric is defined to measure the semantic segmentation distortion of video before and after encoding,and then the corresponding Lagrange multiplier is calculated.Finally,the rate distortion process in video coding is optimized.The results indicate that,compared with the video coding reference software,the proposed method is able to enhance the semantic accuracy of reconstructed video,while preserving the video fidelity efficiently.
Keywords/Search Tags:video coding, video semantic segmentation, human machine hybrid vision, rate distortion optimization, hybrid distortion model
PDF Full Text Request
Related items