| With the rapid development of society,various software and tools based on video have also made great progress.This has undoubtedly significantly increased the number of video files,and video transmission has become one of the main reasons for Internet congestion.In addition to improving the network speed to alleviate this congestion problem,it is also urgent to design more efficient video compression technology.Focusing on the core issue of how to improve the compression rate,researchers have successively developed a series of video coding standards and compression algorithms based on deep learning,and the performance has also been developed by leaps and bounds.Starting from end-to-end video compression based on deep learning and feature space,this thesis studies the key point of improving video compression performance,and proposes a structure-preserving algorithm that uses the original previous frame to assist motion estimation at the current time,thus significantly improving the compression efficiency upon two baselines.In addition,from the perspective of bit allocation for end-to-end video compression framework,this thesis innovatively proposes a bit allocation algorithm based on semi-amortized variational inference,surrounding the construction of an optimal rate allocation scheme,and proves through experiments that this algorithm without any empirical model can achieve the optimal bit allocation effect.The main research contents and innovations of this thesis are summarized as follows:Focusing on the issue of more efficient end-to-end video coding,this paper proposes a scheme that uses the original previous frame to assist in motion estimation at the current time,in order to obtain a structure-preserving motion field,which significantly improves performance.Specifically,this paper found that using only the previously decoded frame as a reference is sub-optimal,as it not only destroys the spatial structure of the inferred motion information,but also loses temporal consistency,thereby reducing overall encoding efficiency.To overcome these problems,this paper proposes to fully explore the ignored original previous frame at the encoder side,while following the decoded previous frame at the decoder side.Specifically,this paper estimates a superior spatial structure that maintains temporal consistency in the motion field by aggregating the motion predictions of the original reference frame and the decoded reference frame relative to the current frame.The method proposed in this paper is plug-and-play and reveals the significant role of the original previous frame in end-to-end video coding,which is of great significance for future exploration.Aiming at the bit allocation problem in end-to-end video compression,this thesis proposes an optimal bit allocation scheme based on gradient descent optimization from the perspective of semi-amortized variational inference,and achieve the best performance improvement.Specifically,this thesis first proposes a continuous bit implementation method based on semi-amortized variational inference.Then,by changing the optimization target,this thesis proposes a pixel-level implicit bit allocation method using iterative optimization.In addition,an accurate Rate-Distortion model is derived based on the differentiability of end-to-end video compression.Then the equivalence between the method and bit allocation is proved by using the exact Rate-Distortion model,and the optimality of the algorithm is proved.The method proposed in this thesis is plug-and-play for all differentiable end-to-end video compression methods.In addition,the Go P(group of picture)level overall optimization of the variational posteriori parameters(i.e.,the latent representation)is directly carried out by using semi-amortized variational inference to achieve the goal of rate allocation.This is essentially a new paradigm of bit allocation,which is also enlightening for subsequent research.This thesis focuses on the research of end-to-end video compression technology.The proposed structure-preserving motion estimation scheme and bit allocation scheme via optimization have improved the video compression performance,providing an effective response to the increasing number of videos.More importantly,through the exploration of the role of the original previous frame in the end-to-end compression framework,and the research of using gradient descent to achieve bit allocation based on semi-amortized variational inference,it provides a meaningful reference for future video compression research. |