Font Size: a A A

Research And Implementation Of Video Compression Based On Deep Learning

Posted on:2019-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2428330599977704Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the 1990 s,along with the widespread popularity of video applications such as digital high-definition television,digital stereoscopic television,internet streaming media,wireless mobile video communications,and high-definition video surveillance,the storage and transmission of massive video data have become two major problems that need to be resolved.Deep learning has made important breakthroughs in many video fields such as face recognition,object detection and tracking,motion recognition,and video content review.However,deep learning has not been fully studied in the field of video compression.This article will take advantage of deep learning in video processing and use deep learning to raise the quality of video frequency compression.The video compression method has two ideas.One is to compress a single video image to remove spatial redundancy in a video image,and the other is to remove temporal redundancy between video frames by video interpolation frames.Based on these two ideas,this article mainly studies the following content:In order to remove the spatial redundancy in video images,this paper first designs a autoencoder based on saliency maps.When coding,the frame of the video image to be compressed is input into the network,and then the number of feature maps and spatial scales of the image are gradually reduced through the convolution layer,and the image is mapped from the pixel space to the new feature space.Then the statistical redundancy in the feature space is removed by quantization and CABAC entropy coding,the pixel allocation is guided by the saliency map,and the output stream is finally coded.The experimental results show that the autoencoder network presented in this paper is superior to JPEG in the compression performance of the standard kodak test set at the same low bit rate.It is also superior to the recently published compression method based on long-short-term memory networks and the other Variational autoencoder-based compression method.In order to remove the temporal redundancy between adjacent image frames in video,this paper designs a video interpolation frame method based on multi-scale convolutional networks and adversarial training.The multi-scale structure can better capture the motion information of the object,and the adversarial training can make the interpolation frame result more in line with the human visual system.The method obtains the interpolation frame result by the GAN generator,and discriminates the accuracy of the interpolation frame result by the GAN discriminator.Finally,the influence of the choice of loss function and the multi-scale structure on the interpolation frame result is compared through experiments.The experimental results show that the optical convolution network based on optical flow estimation(OFP),the convolutional network based on deep-dimensional pixel optical flow(DVF)and the work based on multi-scale network(ByondMSE)proposed in recent years In comparison,the method proposed in this paper can achieve better interpolation frame result on both UCF101 and HMD5-51 video data sets.
Keywords/Search Tags:video compression, deep learning, convolutional neural network, adversarial training, coding
PDF Full Text Request
Related items