| Video reconstruction technology aims at recovering high-quality videos from their low-quality counterparts or measurements,which is widely used in communication,medical treatment,military,and consumer industry.With different physical degradation models,video reconstruction tasks can be classified into super-resolution,compressive sensing,denosing,deblurring,frame interpolation,etc.In this dissertation,the main attention is devoted to the compressive video sensing(CVS)and video super-resolution(VSR),aiming at designing robust and effective algorithms to improve the recovery quality and provide theoretical support for practical applications.Specifically,for CVS,optimization-based approaches and deep-learning-based approaches are studied;for VSR,natural videos and a kind of medical video—wireless capsule endoscopy videos are discussed.The main contributions of this dissertation are as follows:1.CVS aims at recovering original video signals from far fewer samples than required by the Nyquist-Shannon sampling theorem.Multi-hypothesis prediction scheme is wildly used in traditional CVS reconstruction algorithms because of its effectiveness.However,there are two problems in its original design.Firstly,multi-hypothesis scheme is based on Johnson-Lindenstrauss(JL)lemma,which only holds at a high sampling rate.At low sampling rates,the requirements of JL lemma may be not satisfied so that the reconstruction performance is degraded.To address this issue,a mixed measurements-based multi-hypothesis reconstruction algorithm is proposed,which integrates supplement measurements from side information via a special designed regularization.Secondly,in previous works,only the temporal information from reference frames are considered due to their high sampling rates.This may lead to unsatisfied recovery when non-key frames have not obvious correlations with their reference frames.Thus,a novel multi-hypothesis scheme is proposed with extended hypothesis set to include temporal information from other non-key frames.To avoid the huge computation complexity caused by large hypothesis set,a filtering mechanism designed for hypothesis set is employed in this work.Experimental results demonstrate the superiority of proposed frameworks over other traditional reconstruction algorithms.2.With recent advances in deep learning,convolutional neural network(CNN)-based methods have dominated major video reconstruction tasks.In this work,a novel reconstruction network is proposed for CVS.The core idea of multi-hypothesis prediction scheme from traditional CVS algorithm is introduced into CNN to exploit temporal correlations between key and non-key frames.It employs embedded Gaussian function to measure correlations on a learned high-dimensional domain,so that the prediction results are more robust and accurate.After the multi-hypothesis prediction,a residual network is designed to recover the residuals between multi-hypothesis prediction and the desired frame.The final result is derived by adding the reconstructed residuals to the MH prediction.Unlike the block-wise reconstruction in existing DNN-based CS architectures,this network builds a mapping from block measurements to a complete frame reconstruction,reducing block artifacts significantly.Benefitting from the CNN’s nature,the forward propagation of the proposed network is extremely fast,making it suitable for real-time applications.Experimental results show that the proposed network presents a better recovery performance compared with existing DNN-based recovery methods and traditional iterative recovery algorithms.3.VSR aims at recovering high-resolution(HR)frames from their low-resolution(LR)counterparts.To exploit temporal correlations,most recovery networks have to face two challenges:1)how to align consecutive frames containing motions,occlusions and blurring,and establish accurate temporal correspondences,2)how to effectively fuse aligned frames and balance their contributions.In this work,a novel VSR network is proposed to solve above problems in an efficient and effective manner.For alignment,a temporal-spatial non-local module is employed to align each frame to the reference frame.Compared with existing alignment approaches,the proposed align operation is able to integrate the global information of each frame by a weighted sum,leading to a better performance in alignment.For fusion,an attention-based progressive fusion framework was designed to integrate aligned frames gradually.To penalize the points with low-quality in aligned frames,an attention mechanism was employed for a robust reconstruction.Experimental results demonstrate the superiority of the proposed network over other state-of-the-art methods.4.The limited resource at the encoder of wireless capsule endoscopy(WCE)typically leads to a poor video quality,interfering the subjective diagnosis of clinical experts or the objective analysis in some automated systems.However,different from natural videos,the promise of applying VSR techniques into WCE applications has been offset by two significant challenges.Firstly,HR counterparts of real endoscopic frames are unavailable for supervised learning.Secondly,utilizing temporal correlations to enhance recovery performance becomes difficult due to the poor quality of WCE videos.To tackle these two challenges,we develop a novel training dataset and an effective VSR network for practical WCE applications in this dissertation.Specifically,in order to construct the training dataset,clean natural videos and synthetic endoscopic videos are collected as the ground truth.Then,a set of complex degradation models is designed to generate their LR counterparts.Although real HR endoscopic videos remain unknown during the training,this“difficult” dataset enables the trained network to have a strong ability in generalization so that it can also make reasonable inferences on real LR endoscopic videos.To utilize temporal correlations in WCE videos,a block-based non-local alignment module is proposed to build accurate correspondences among consecutive frames.The non-local operations are able to capture large-scale motions under poor video quality,and the block-wise characteristic reduces its computational complexity significantly.Extensive experiments demonstrate the superiority of the proposed VSR method compared to other state-of-the-art super-resolution approaches on real WCE videos. |