Font Size: a A A

A New Second Order Statistics Based Descriptor And Its Application In Object Detection And Tracking

Posted on:2011-10-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X P HongFull Text:PDF
GTID:1118330338989411Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with increasing real-world applications of visual surveillance, human machine interaction systems, and intelligent transport systems, object detection and object tracking have been one of the most active research areas in computer vision during the last decades. The main challenges for object detection and tracking lie in the following two aspects: large intra-class divergence and low inter-class separability. In this dissertation, the technical road map to address these two challenges is to develop the second order statistics based region descriptors, which shows high discriminative power and good robustness in many computer vision tasks. However, operations of the traditional second order statistics based region descriptors, i.e., the covariance matrices, on Riemannian manifold are usually computationally demanding. To solve this problem, this dissertation proposes a novel second order statistics based region descriptor, named Sigma set, and defines how to calculate the distance between two Sigma sets and the mean of multiple Sigma sets. In addition, this dissertation extends the proposed region descriptor to represent a whole image and studies how to combine the proposed descriptor with the commonly used machine learning algorithms. Furthermore, a robust modeling method to represent the target in real-time tracking environment is proposed. Finally, this dissertation discusses on how to design the efficient boosted weak classifiers where the distance metric of commonly used region descriptors is embedded.The detailed descriptions for these techniques are introduced in the following manner.Firstly, second order statistics based region descriptors, which captures the correlations among extracted features inside an object region, is of low dimensionality, discriminative and robust against the variations in illumination, view, and pose, etc. Despite such advantages, because the traditional second order statistics based region descriptors, i.e., the covariance matrices (COVs), do not lie in Euclidean space, computationally demanding operations based on Riemannian manifold are required to measure the distance between two COVs and calculate the mean of COVs accurately. To solve this problem, this dissertation proposes a novel efficient region descriptor, named Sigma set, which encodes second order statistics of the given image region in the form of a small set of vectors. Sigma set can be uniquely constructed through Cholesky decomposition of the covariance matrix. As the covariance matrix descriptors, Sigma sets are of low dimension, powerful and robust. Moreover, compared with COVs, Sigma sets are not only more efficient in the calculations of the distance between two Sigma sets and the mean of multiple ones, but also easier to be enriched with first order statistics. Experimental results in texture classification verify the effectiveness and efficiency of the proposed region descriptor.Secondly, this dissertation extends Sigma sets to holistic representations of images. Sigma set image representation is capable of encoding information of the given image and achieves accuracy comparable to COV representation. In addition, compared with the traditional covariance matrix image representation, Sigma set image representation is more efficient, since that it lies in the vector space. Experiments show that Sigma set image representation works well in object detection as well as object classification.Thirdly, in the real-time object tracking task, how to adapt to the appearance changes of moving objects promptly is crucial. Based on the multiple-patch Sigma set representation, this dissertation proposes to utilize the online one-class support vector machine algorithm, named Implicit online Learning with Kernels Model (ILKM) to model each of the patches. ILKM is simple, efficient, and capable of learning a robust online target predictor in the presence of appearance changes. Responses of ILKMs corresponding to multiple target patches are fused by an arbitrator, which makes the decision and triggers the model update. The benefits of the proposed appearance model are fourfold: firstly, patch based representation increases the robustness against partial occlusions; secondly, ILKM is capable of learning a robust predictor based on the observed samples efficiently; thirdly, the arbitrator including an analysis of possible partial occlusions further improves the robustness; finally, Bayesian inference framework ensures the efficiency of the proposed tracking approach. Experimental results demonstrate that the proposed tracking approach is effective and efficient in ever-changing and cluttered scenes.Fourthly, there are different distance metrics used for measuring different region descriptors. In most cases, the usage of not only the region descriptor but also its distance metric ensures the high performance. It is possible for traditional non linear classifiers to use the kernel functions to reflect the distance metrics of a specific region descriptor as the kernel functions. However, the computational costs of the traditional non linear classifiers are extremely high. To solve this problem, we propose sample pre-mapping as a novel pre-processing step before samples are fed into weak classifiers in Boosting. Sample pre-mapping, from the original input space to another space with the same dimensionality, is a point dependent mapping derived from non linear kernel functions. Linear weak classifiers in the pre-mapped space provide effective and efficient approximations of the optimal separating hyperplanes in the kernel-induced high-dimensional space. Hence the resulting classifiers achieve the accuracy comparable to that of kernel methods, with the computational cost of linear classifiers in both training and detection. Experimental results in both pedestrian detection and car detection on public datasets show that weak learning in the pre-mapped space is capable of achieving accuracy comparable to traditional non-linear classifiers using the compuatational costs of linear classifiers.
Keywords/Search Tags:object detection, object tracking, region descriptor, covariance matrix, kernel method, and boosting
PDF Full Text Request
Related items