| As the carrier of visual information,image and video are the core of the computer vision researches.Many visual tasks are expanded based on image,including image classification,object detection and image segmentation.In these tasks,effective image feature extraction is the key to determine the algorithm performances.Metric learning,as a feature learning method,is a mainstream method in various feature extraction algorithms.By learning the low-dimensional embedding space of features,metric learning makes the distance between images of the same class closer and images of different class farther.Based on metric learning,image classification method has been successfully used in real life such as face recognition,image retrieval and texture classification.Relying on big data,deep convolutional neural network(CNN)based on gradient descent algorithm for iterative optimization has been widely researched and applied,while some studies focus on using non-iterative optimization methods to learn CNN,such as PCANet.This category of methods is simple but efficient,especially in some classification tasks with a small amount of data available,and has good performance.On the other hand,high resolution video frames play an important role in helping people acquire information and perceive the world because they contain more detailed information.Therefore,video super-resolution task mainly solve the mapping process of video from low resolution to high resolution.It has a wide range of application scenarios,such as video surveillance,high-definition television and satellite images.In order to super-resolve the corresponding low-resolution frame(reference frame),it is common to align multiple neighboring frames(support frame)with the reference frame in the algorithm design process due to the camera or object motion.Plenty of methods use deformable convolution for implicit inter-frame alignment to achieve high performance,such as EDVR.Based on the advantages of deep learning,this thesis studies two computer vision tasks including image feature representation and video super-resolution.The main contributions are as follows:(1)Marginal Fisher Analysis based CNN for image representationThis paper further extends the line of researches which learn CNNs with non-iterative optimization methods and proposes a deep Marginal Fisher Analysis(MFA)based CNN,termed as DMNet.It addresses the limitation of PCANet like CNNs when the samples do not follow Gaussian distribution,by using a local MFA for CNN filter optimization.It uses a graph embedding framework for convolution filter optimization by maximizing the inter-class discriminability among marginal points while minimizing intra-class distance.Cascaded MFA convolution layers can be used to construct a deep network.In addition,a binary random hash(BSH)method is proposed as a nonlinear layer to make the network generate more robust feature,which selects features according to the importance of the feature map obtained by convolution.Experimental results demonstrate that the proposed method achieves state-of-the-art result in non-iterative optimized CNN methods,and ablation studies have been conducted to verify the effectiveness of the proposed modules in our DMNet.(2)Video super-resolution network with frequency decomposition alignmentExisting alignment methods based on deformable convolution simply align all the features of the frame without considering the difference difficulties in alignment between high and low frequency components.In this paper,we introduce a differentiable frequency decomposition method into super-resolution network architecture and propose a frequency decomposition alignment module,termed as FDA,which can effectively reduce the interference of other information in the process of video frame alignment so as to enhance the effect of alignment between frames.At the same time,a novel loss function is added in the network training to make the reconstructed frame can still obtain the original low resolution frame through downsampling operation,so as to maintain the consistency of the reconstructed frame and the original reference frame information.To verify the effectiveness of the FDA module,we applied it to an EDVR like VSR network and compared it with some well-performed methods.Experimental results show that the FDA module can effectively improve the performance of the network.The ablation studies also further verifies the importance of high frequency features of supporting frame for reconstruction of high resolution reference frame. |