Font Size: a A A

Research On Deep Learning Based Video Coding

Posted on:2020-09-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1368330590472972Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of communications and Internet and the popularity of mobile terminals and smart devices,traditional multimedia applications such as digital broadcast television,Internet video,video conference,telemedicine,distance education,and the emerging multimedia applications such as 3D video,virtual reality video,short video have enriched people's daily life,which also makes the video data increasing explosively.Numerous video data brings great challenges for data storage and network transmission.How to stably store and efficiently transmit a large amount of video data has become an urgent problem.Digital video compression technology plays a key role in compressing the video data,and the wide application of digital video compression technology in the fields of communication,computer,radio and television has led to the emergence and development of digital video coding standards.So far,although the latest released video coding standards HEVC and AVS2 can satisfy the requirements of compression performance for HD and UHD digital videos,with the development of artificial intelligence and the arrival of the 5G era,larger amount of video data puts higher demands on future video coding standards.Therefore,It is necessary to further improve the compression efficiency based on the existing digital video coding standards.In recent years,with the development of deep learning,convolutional neural network,a typical example of deep neural network,has achieved remarkable results in computer fields such as computer vision,speech recognition,and natural language processing.Improving the compression performance of video coding using deep learning can not only provide technical reserves for future digital video coding standards,but also is a frontier problem and research hotspot in the field of video coding.In this paper,we explore how to improve the compression performance of video coding technology using deep learning,which focuses on three main modules: intra prediction,inter prediction and in-loop filter in the digital video coding standard framework.The content of the dissertation can be divided into three sections that are detailed as follows:First,in this paper,multi-scale convolutional neural network based intra prediction is proposed to improve the accuracy of intra prediction in video coding.Direction interpolation based intra prediction has been widely used in existing digital video coding standards.This method predicts blocks with main direction texture well,but cannot well predict blocks with complex texture or weak directionality.In order to improve the accuracy of intra prediction in existing video coding standards and to reserve technology for the development of next generation video coding standards,a multi-scale convolutional neural network based intra prediction is proposed in this paper.Specifically,the proposed algorithm consists of two sub-networks: a multi-scale feature extraction network and a restoration network.The predicted block generated by HEVC intra prediction and the adjacent reconstructed L-shape pixels are combined into a larger block,then the larger block is fed into the multi-scale feature extraction network.Next,the block is downsampled and the feature maps of different scales are extracted.Finally,the feature maps are upsampled and restored to the original scale.The restoration network is used to aggregate feature maps of different scales and use convolution operations to generate the final more accurate predicted block.Experimental results demonstrate that compared with the HEVC reference software HM 16.9,the proposed intra prediction can obtain 3.4% BD-rate saving.Second,in this paper,neural network based inter prediction is proposed to improve the accuracy of inter prediction in digital video coding.Inter prediction in existing digital video coding standards generates current predicted block from the reference frames using motion estimation and motion compensation.However,motion estimation is used when considering motion is translational,which cannot handle more complex changes in natural video,such as nonlinear illumination change,blurring,zooming,etc.In order to improve the accuracy of inter prediction,neural network based inter prediction is proposed in this paper,aiming at utilizing the neighboring reconstructed L-shape pixels of current block and the reference block to improve the accuracy.The proposed method consists of three sub-networks: the relation estimation network,the combination network,and the deep refinement network.The relation estimation network is used to learn the relation between the current block and its predicted block.The combination network is used to extract the feature maps of the learned relation and the predicted block and then concatenate these feature maps together.The deep refinement network is used to generate the final more accurate predicted block.Experimental results demonstrate that compared with HEVC reference software HM 16.9,the proposed inter prediction can obtain 4.4% BD-rate savings.Third,in this paper,a convolutional neural network based in-loop filter algorithm and a GPU based in-loop filter parallel optimization algorithm are proposed.The former aims to improve the coding performance of in-loop filter using the convolutional neural network,and the latter aims to reduce the encoding complexity of in-loop filter.In-loop filter plays an important role in the existing digital video coding standard,which can not only reduce the blocking artifacts,ringing artifacts and improve the subjective quality of the reconstructed video,but also improve the compression efficiency of video coding.In this paper,we explore the in-loop filter from the following two aspects: on the one hand,a convolutional neural network based in-loop filter algorithm is proposed to improve the coding performance.Specifically,a novel structure of convolutional neural network is proposed,in which the reconstructed video and the edge information are used to improve the performance of in-loop filter.The edge information is generated during the process of encoding(e.g.block partitioning,residuals and motion vectors).Experimental results demonstrate that compared with the HEVC reference software HM 16.9,the proposed in-loop filter algorithm can obtain 4.6% BD-rate saving.On the other hand,the high complexity of in-loop filter is the bottleneck for HEVC real-time encoding applications.In order to reduce the encoding complexity of in-loop filter and consider the multi-device cooperative coding of CPU+GPU for deep learning based video coding architecture,a GPU-based in-loop filter parallel optimization algorithm is proposed in this paper.Specifically,a multi-device coordinated parallel strategy for HEVC encoding using CPU+GPU is proposed,reducing the complexity of in-loop filter in HEVC encoder by parallel processing Deblocking and SAO on GPU.Experimental results demonstrate that compared with HEVC open source encoder x265,the proposed algorithm can achieve 47% encoding time saving.
Keywords/Search Tags:Video coding, deep learning, intra prediction, inter prediction, in-loop filter
PDF Full Text Request
Related items