| Moving object detection aims to extract moving object from the background of video sequences.In the intelligent video surveillance,moving object detection is one of the essential technology and most critical step to understand the image content.Furthermore,the performance of moving object detection plays an important role in the object recognition,behavior understanding and analysis etc.In recent years,the low-rank and sparse separation methods have drawn much attention and have been successfully applied for moving object detection.However,in complex scenarios and extreme conditions,such as low illumination,poor visibility,occlusion and other complex conditions,the algorithm has low robustness and cannot preserve the contours of the detected moving objects.To handle above problems,we propose spatiotemporal and cross-modal consistency based on the low-rank and sparse separation framework.The main work and contributions of the thesis include as follows:(1)In order to introduce temporal information,we propose a novel approach to pursue the high-order consistency for moving object detection in the low-rank and sparse separation framework.Generally speaking,the surveillance video is in 20-30 frames per second.There are redundancy and strong correlation between video frames.The moving objects are consistent in the short time.Effectively using the temporal relationship between video frames will help to improve the accuracy and robustness of moving object detection.Firstly,we use video segmentation algorithm to obtain supervoxel information of videos,the supervoxel refers to the high-order/3-dimensional voxels with both long-range spatial and temporal neighboring pixels of similar appearances.We use the temporal information of supervoxel and incorporate with the high-order potential to construct the temporal relationship with foreground object.Furthermore,in order to better preserve the contours of moving object,the spatial structure constraints are constructed by using the regional structure information of the supervoxel.Finally,we integrate the spatial pairwise smoothness,and the supervoxel-based high-order consistency into a unified low-rank and sparse separation framework,and design a single optimization algorithm to learn the background model and foreground mask simultaneously by iteratively employing the SOFT-IMPUTE algorithm and the graph cut algorithm.Extensive experiments on the benchmark datasets GTFD and CDnet suggest that our approach achieves superior performance over several state-of-the-art algorithms.(2)Aiming at the low performance of complex scenes such as low illumination and occlusion,we propose a novel approach which pursues cross-modal consistency for robust multi-modal moving object detection.RGB data taken by visible camera generally has high resolution,rich color,texture and spatial structure features,but it is easily affected by uncontrolled illumination conditions.Thermal infrared camera images through thermal radiation scene object of which can get rid of the low illumination,occlusion,dense fog and other poor environments.In order to effectively utilize the complementarity of multi-modal information and improve the robustness of moving object detection,we propose a novel and robust multispectral foreground detection approach to capture cross modality consistency among the heterogeneous modalities in a unified low-rank decomposition framework.Existing multi-modal detection methods,through the addition rule or learning the shared foreground mask matrix for different modalities to achieve adaptive fusion of different source data,which ignored the difference with different object in object imaging.We propose to construct the cross-modal graph to pursue the cross-modality consistency among the RGB and thermal data into a unified low-rank decomposition framework,and introduce the appearance consistency to improve the performance of moving object detection.Extensive experiments on the multi-modal benchmark datasets GTFD show that our method can effectively fuse the information from two modal,and achieve promising performance in both grayscale and thermal modalities even if there are misalignment among the image pairs. |