Font Size: a A A

Research On The Theories And Methods Of Object Tracking And Segmentation In Video

Posted on:2017-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:S GuFull Text:PDF
GTID:1108330485988417Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Video object tracking and segmentation is an important branch of computer vision. It provides the necessary precondition for the reliable operation of the other algorithms as one of the most important underlying technologies. The goal of object tracking is the estimation of the object’s state in each frame based on the object’s initial state in the first frame. Object segmentation essentially belongs to the category of object tracking. In addition to the trakcing, object segmentation needs to determine the precise shape of the interesting object in each frame of a video. There are many challenging factors in video object tracking and segmentation such as deformation, occlusion and scale variation. To solve these problems, four novel algorithms are proposed in this thesis. Among them, two approaches are applied to track the rigid object based on mixture of regression mod-els, and some experiments demonstrate our methods’robustness against scale variation. Other approaches are used to track and segment any form of objects (rigid or non-rigid object) combining with the image segmentation theory, and they have strong robustness against occlusion and scale variation. There are four components in our research:1. Traditional learning-based methods track the object based on the assumption that the relationship between the object appearance and the motion is linear or non-linear. They will result in bad tracking performance when single linear model is adopted. On the other hand, a non-linear regression model should be specified in advance when the relationship is assumed by non-linear. To circumvent the afore-mentioned drawbacks, a mixture linear model is adopted in this thesis, which describes the ground truth model accurately, and decreases the error caused by single linear model. Meanwhile, the pro-posed approach allows the approximation of the parameters in a data-driven way without specifying the distribution approximation in advance. Moreover, a fast learning strat-egy is proposed to decrease the complexity of the learning computation. Computing the inverse of two low-dimension matrices is much faster than computing the one of an high-dimension matrix, and enhances the robustness of the learning model. Especially, this approach constructs the relationship between the object appearance and the motion pa-rameters, and it can estimate the object’s status directly, such as velocity, direction and scale.2. In mixture linear model, it is still limited because the mixing coefficients are in-dependent of the input data. We can further increase the capability of such models by allowing the mixing coefficients themselves to be functions of the input variable, and it is referred as Mixture of Experts (MoE) where subspaces are divided by "soft boundary" MoE, as an extension of mixture linear model, describes the ground truth model more accurately than single regression model. All model’s parameters can be learned inde-pendently by maximum likelihood. Moreover, an online algorithm updates the mixture model during tracking, which enhances the system’s robustness against noise.3. Our proposed learning-based approach has a poor performance in terms of non-rigid object. To solve this problem, a video object segmentation solution based on saliency filter is proposed. A tracking and segmentation method is converted into a salient segmentation solution in this algorithm, and the object’s spatio-temperal coher-ence is measured by computing the "relative saliency" in the successive frames and the "absolute saliency" in an individual image. A Conditional Random Field is constructed based on both saliencies, and the object is segmented by Graph Cut. This approach is a region-based solution, and each region is abstracted by Local Log-Euclidean Covari-ance Matrix. This kind of feature integrates every raw features together regardless of the region’s size and shape, enhancing the system’s robustness against deformation such as non-rigid object. Moreover, our online weight strategy in energy function can adjust the importance factor of each cue robustly according to the different scenes.4. To improve the system’s accuracy, a low-rank sparse representation based ap-proach is proposed in the thesis. By representing the elements in current frame as sparse linear combinations of dictionary templates, this algorithm capitalizes on the inherent low-rank structure of representations that are learned jointly. The coefficients of the con-strained representation will act as the measurement of the spatio-temporal coherence. Meanwhile, an adaptive dictionary is proposed to enhance the system’s robust against occlusion. Combining with the proposed online weight strategy, the object is tracked and segmented automatically in the system. Moreover, an extension approach is proposed in terms of multi-object tracking without increasing the system’s learning complexity.
Keywords/Search Tags:video, object tracking and segmentation, Mixture of Regression, saliency, low-rank sparse representation
PDF Full Text Request
Related items