Font Size: a A A

Learning Low-rank And Sparse Representation For Video Object Extraction And Tracking

Posted on:2017-09-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:C L LiFull Text:PDF
GTID:1318330485964104Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the fundamental problems in computer vision, video object extraction and tracking are core techniques in visual surveillance system. Although much progress has been made recently, they are still very challenging due to the com-plexities of data, scenes and environments. To address above issues, this paper investigates the low-rank and sparse representation models for video object ex-traction and tracking, including regularized low-rank representation for video ob-ject segmentation, weighted low-rank decomposition for grayscale-thermal fore-ground detection, patch-based dynamic graph learning for object tracking and collaborative sparse representation for grayscale-thermal tracking.First, we propose a video object segmentation framework based on the reg-ularized low-rank representation. Taking supervoxels as graph nodes, we employ the low-rank representation model to refine their affinities, so that the refined affinities are robust to both sparse corrupts and dense Gaussian noises. To en-hance the discriminative ability between supervoxels, we integrate the discrim-inative replication prior into the low-rank representation model, i.e., the regu-larized low-rank representation model. We also develop an efficient algorithm, sub-optimal low-rank decomposition, to optimize the proposed model. In ad-dition, to process arbitrarily long videos in limited computational and memory spaces, we extend the proposed method into a streaming one. Both unsupervised and interactive tasks are performed based on the proposed model, and the supe-rior performance against other methods on public benchmark demonstrate the effectiveness of the proposed approaches.Second, we propose a general multimodal framework based on the weighted low-rank decomposition for moving object detection, which can effectively over- come the shortcomings of visible spectrum in some specified conditions, such as background clutter, illumination variation and foggy effect. Specifically, we introduce a quality weight for each modality, and thus jointly model the low-rank structured backgrounds of different modalities, the sparse foreground masks shared by all modalities and the contiguousness of both foregrounds and back-grounds. The proposed model can fuse different modal information in an adaptive way to detect moving objects robustly. To improve the efficiency, we present a nearly-real-time algorithm based on the edge-preserving filtering. Besides, we create a standard platform for multimodal foreground detection, which includes 25 aligned video pairs. The created benchmark will be beneficial to the research progress of multimodal foreground detection and related problems.Third, a patch-based dynamic graph learning algorithm is proposed for vi-sual tracking. The proposed method can mitigate the effects of background information, and thus can alleviate the model drift problem. Specifically, we partition the object bounding box into non-overlapping image patches, and then assign each patch with a weight that describing the importance of target object. Unlike the conventional 8-neighbor-graph, we take image patches as graph nodes, and learn a dynamic graph by exploiting the global low-rank structure and the lo-cal linear relationship of patch descriptors. Moreover, we solve the patch weights and graph structure jointly in a semi-supervised way. In addition, we present a real-time algorithm to optimize the proposed model. Finally, we integrate the optimized weights into tracking and model updating in the tracking-by-detection framework, and significantly improve the tracking performance.Finally, we propose a collaborative sparse representation based multimodal tracking method in the Bayesian filtering framework. The conventional multi-modal tracking methods treated each modality equally, which may significantly limit the tracking performance in dealing with occasional perturbation or mal- function of individual sources. Therefore, this paper introduces a quality weight for each modality to integrate different modalities adaptively. In particular, the quality weight of one modality is determined by the corresponding reconstruc-tion residual and discriminative ability between the object and its surrounding background, and optimized with the sparse codes jointly. In addition, we build a standard benchmark for multimodal tracking, which includes 50 aligned video pairs,22 baseline methods and 2 evaluation metrics. Such benchmark contributes a comprehensive evaluation platform for multimodal tracking and related research problems.
Keywords/Search Tags:Multimodal Dataset Construction, Video Statistical Prior, Infor- mation Fusion, Weighted Low-Rank Decomposition, Collaborative Sparse Rep- resentation, Dynamic Graph Learning, and Joint Optimization
PDF Full Text Request
Related items