Font Size: a A A

Video registration for multimodal surveillance systems

Posted on:2013-05-24Degree:Ph.DType:Thesis
University:Ecole Polytechnique, Montreal (Canada)Candidate:Torabi, AtousaFull Text:PDF
GTID:2458390008470904Subject:Engineering
Abstract/Summary:
Image registration approaches for an automatic multimodal video surveillance system are divided into two general approaches based on the range of captured scene: the approaches that are appropriate for long-range scenes, and the approaches that are suitable for close-range scenes. In the literature, this subject of research is not well documented, especially for close-range surveillance application domains. Our research is focused on novel image registration solutions for both close-range and long-range scenes featuring multiple humans. The proposed solutions are presented in the four articles included in this thesis. Our registration methods are applicable for further video analysis such as tracking, human localization, behavioral pattern analysis, and object categorization.;For far-range video surveillance, we propose an iterative system that consists of simultaneous thermal-visible video registration, sensor fusion, and people tracking. Our video registration is based on a RANSAC object trajectory matching, which estimates an affine transformation matrix to globally transform foreground objects of one image on another one. Our proposed multimodal surveillance system is based on a novel feedback scheme between registration and tracking modules that augments the performance of both modules iteratively over time. Our methods are designed for online applications and no camera calibration or special setup is required.;For close-range video surveillance applications, we introduce Local Self-Similarity (LSS) as a viable similarity measure for matching corresponding human body regions of thermal and visible images. We also demonstrate theoretically and quantitatively that LSS, as a thermal-visible similarity measure, is more robust to differences between corresponding regions' textures than the Mutual Information (MI), which is the classic multimodal similarity measure. Other viable local image descriptors including Histogram Of Gradient (HOG), Scale Invariant Feature Transform (SIFT), and Binary Robust Independent Elementary Feature (BRIEF) are also outperformed by LSS.;Moreover, we propose a LSS-based dense local stereo correspondence algorithm based on a voting approach, which estimates a dense disparity map for each foreground region in the image. The resulting disparity map can then be used to align the reference image on the second image. We demonstrate that our proposed LSS-based local registration method outperforms similar state-of-the-art MI-based local registration methods in the literature. Our experiments were carried out using realistic human monitoring scenarios in a close-range scene.;Due to the shortcomings of local stereo correspondence approaches for estimating accurate disparities in depth discontinuity regions, we propose a novel stereo correspondence method based on a global optimization approach. We introduce a stereo model appropriate for thermal-visible image registration using an energy minimization framework and Belief Propagation (BP) as a method to optimize the disparity assignment via an energy function. In this method, we integrated color and motion visual cues as a soft constraint into an energy function to improve disparity assignment accuracy in depth discontinuities. Although global correspondence approaches are computationally more expensive compared to Winner Take All (WTA) local correspondence approaches, the efficient BP algorithm and parallel processing programming (openMP) in C++ that we used in our implementation, speed up the processing time significantly and make our methods viable for video surveillance applications. Our methods are implemented in C++ using OpenCV library and object-oriented programming.;Our methods are designed to be integrated easily for further video analysis. In other words, the input data of our methods could come from two synchronized online video streams. For further analysis a new module could be added in our frame-by-frame algorithmic diagram. Further analysis might be object tracking, human localization, and trajectory pattern analysis for multimodal long-range monitoring applications, and behavior pattern analysis, object categorization, and tracking for close-range applications. (Abstract shortened by UMI.).
Keywords/Search Tags:Video, Registration, Multimodal, Surveillance, System, Pattern analysis, Image, Approaches
Related items