Font Size: a A A

Towards Efficient Image And Video Semantic Parsing

Posted on:2013-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B LiuFull Text:PDF
GTID:1228330392455479Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Image and video content analysis is one of the fundamental tasks in computer vision andother related communities, e.g. multimedia computing, artifcial intelligence and automaticcontrol. In general, these image tasks include but not limited to image classifcation, imageregion segmentation, face recognition and image annotation, while the video tasks containobject recognition, object tracking, and motion segmentation. A good computer algorithmis always characterized with effective representation models, unifed theories and effcientinference algorithms. Although researchers have proposed quite a number of theories andalgorithms in the past literatures, these problems are still open due to their ill-ness natureand the well known semantic gap between low-level features and high-level semantics. Toenhance system performance, existing arts are studied and extended in terms of models,theories and algorithms for various particular applications, and further evaluated in publicbenchmark datasets. According to the underlying theories adopted, these studies can becategorized into three folders.First, the subspace learning theories are extended by a newly proposed nonnegativegraph embedding algorithm as well as a related optimization procedure. Subspace learn-ing is one of the most effective methods for machine learning and computer vision tasks,and graph structure is one of the most popular representation. Most of the subspace learn-ing algorithms, as studied, can be re-formulated with a unifed formula, and solved by aunifed solution platform, which are both mathematically proofed correct and convergent.Base on the nonnegative graph embedding framework, three new algorithms are developed,including:1) the projective non-negative matrix factorization, used to seek for the low-dimensional representation of a new testing sample;2) the nonnegative co-decompositionalgorithm, which, as the frst time, integrates nonnegative analysis, matrix factorization and multi-feature representations, to provide a powerful multi-modal learning algorithm for vari-ous computer vision problems;3) the nonnegative data factorization based image label com-pletion algorithm, which aims to recover the un-given labels for images. Experiments withcomparisons on multiple benchmark datasets show that these proposed methods can achievethe state-of-the-art performance.The second one is related to the sparse coding theory, which is one of the most im-portant advances in the past decade in signal processing and computer vision communities.However, its applications mainly fall in the felds of low-level vision problems, or problemshandling samples with clean structures. To address high-level image tasks, structured or lay-ered prior models are by nature in need. Following this methodology, a bi-layer sparsity prioris discovered and one bi-directional label propagation procedure is introduced to address theimage region labeling task; an effcient search based image region annotation algorithms isdeveloped to fully take advantage of the Internet image corpus; a new regularization ter-m is developed to impose the positiveness exclusive prior for sparse coding formula andfurther applied for multi-class semi-supervised learning task. These developed models andalgorithms are well evaluated in respective benchmarks, and are able to outperform otherstate-of-the-art methods.The third topic is related to the Bayesian modeling and inference for video parsing,which aims to track, locate, or classify the objects of interest. Contributions are made invarious aspects. In term of video representation, a new spatio-temporal graph as well asa hybrid template are proposed for representing objects in videos; in term of modeling, aunifed generative model is built to integrate the representation model and related priors;in term of inference, a new data-driven cluster sampling algorithm and a novel featurepursuit procedure are introduced to speed up the convergence in search of optimal solution;in term of implementation, a video surveillance system is developed to combine allabove techniques to provide parsed video trajectories. These models and algorithms are extensively evaluated on several public benchmark datasets.
Keywords/Search Tags:Computer Vision, Image/Video Parsing, Graph Embedding, NonnegativityAnalysis, Probabilistic Model, Sparse Coding
PDF Full Text Request
Related items