| Tracking object in extreme circumstances is challenging,and the key to track object robustly is to integrate cross-modal information from different sources effectively.Constrained by the size of sensors,the images are always in low resolution.Consequently,it is important to do research on Cross-modal Image super-resolution(CMISR).CMISR aims at recovering multiple high-resolution images from their low-resolution counterparts by integrating cross-modal information effectively.This thesis presents an extensive investigation on the foreground issues and makes the following progresses:First,we propose a deep bilateral learning network named BSSRNet to super-solve the stereo image.We integrate information from both view of stereo images by parallax attention mechanism,and then the bilateral framework will concentrate on using the edge information from one image to slice bilateral grid,the other view of image can be superresolved by the generated dynamic kernel.Experiments have demonstrated that BSSRNet can integrate information from both view of cameras with the edge information of original images well protected,and the speed and efficiency of BSSRNet is among state-of-the-art.Second,we propose a learning edge transformer network for guided thermal image super-resolution named JBGSR.We use a learning edge extractor to extract edge feature from both modalities,and then use transformer to integrate cross-modal information.We further fuse the integrated feature with the extracted feature from the original thermal image.With the two stage information fusion,our proposed network can extract useful information from visible image with the guide of edge information of thermal image.Experimental results on benchmark dataset demonstrated that JBGSR can effectively reconstruct high-resolution thermal image and the efficiency is SORT.Third,we make an analysis on the characteristics of cross-modal images,and process the low resolution thermal images with the aforementioned JBGSR.Since the superresolved thermal image is highly alignment with visible image,we propose an algorithm named EStaple to track object with cross-modal information in correlation-filter based framework.Our method fuse information from both modalities at three levels(i.e.,pixel level,feature level and decision level).The pixel level fusion is obtained at image preprocess stage.For the feature fusion level,we use the spatial and spectral attention mechanism to fuse information adaptively.Finally,we get the final result at decision level fusion.Extensive experiment on RGB-T benchmark datasets have demonstrated the effectiveness of the information fusion mechanism of our proposed tracking algorithm,and the efficiency of EStaple is among the state-of-the-art. |