Towards Efficient Image And Video Semantic Parsing

Posted on:2013-01-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X B Liu

Full Text:PDF

GTID:1228330392455479

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Image and video content analysis is one of the fundamental tasks in computer vision andother related communities, e.g. multimedia computing, artifcial intelligence and automaticcontrol. In general, these image tasks include but not limited to image classifcation, imageregion segmentation, face recognition and image annotation, while the video tasks containobject recognition, object tracking, and motion segmentation. A good computer algorithmis always characterized with effective representation models, unifed theories and effcientinference algorithms. Although researchers have proposed quite a number of theories andalgorithms in the past literatures, these problems are still open due to their ill-ness natureand the well known semantic gap between low-level features and high-level semantics. Toenhance system performance, existing arts are studied and extended in terms of models,theories and algorithms for various particular applications, and further evaluated in publicbenchmark datasets. According to the underlying theories adopted, these studies can becategorized into three folders.First, the subspace learning theories are extended by a newly proposed nonnegativegraph embedding algorithm as well as a related optimization procedure. Subspace learn-ing is one of the most effective methods for machine learning and computer vision tasks,and graph structure is one of the most popular representation. Most of the subspace learn-ing algorithms, as studied, can be re-formulated with a unifed formula, and solved by aunifed solution platform, which are both mathematically proofed correct and convergent.Base on the nonnegative graph embedding framework, three new algorithms are developed,including:1) the projective non-negative matrix factorization, used to seek for the low-dimensional representation of a new testing sample;2) the nonnegative co-decompositionalgorithm, which, as the frst time, integrates nonnegative analysis, matrix factorization and multi-feature representations, to provide a powerful multi-modal learning algorithm for vari-ous computer vision problems;3) the nonnegative data factorization based image label com-pletion algorithm, which aims to recover the un-given labels for images. Experiments withcomparisons on multiple benchmark datasets show that these proposed methods can achievethe state-of-the-art performance.The second one is related to the sparse coding theory, which is one of the most im-portant advances in the past decade in signal processing and computer vision communities.However, its applications mainly fall in the felds of low-level vision problems, or problemshandling samples with clean structures. To address high-level image tasks, structured or lay-ered prior models are by nature in need. Following this methodology, a bi-layer sparsity prioris discovered and one bi-directional label propagation procedure is introduced to address theimage region labeling task; an effcient search based image region annotation algorithms isdeveloped to fully take advantage of the Internet image corpus; a new regularization ter-m is developed to impose the positiveness exclusive prior for sparse coding formula andfurther applied for multi-class semi-supervised learning task. These developed models andalgorithms are well evaluated in respective benchmarks, and are able to outperform otherstate-of-the-art methods.The third topic is related to the Bayesian modeling and inference for video parsing,which aims to track, locate, or classify the objects of interest. Contributions are made invarious aspects. In term of video representation, a new spatio-temporal graph as well asa hybrid template are proposed for representing objects in videos; in term of modeling, aunifed generative model is built to integrate the representation model and related priors;in term of inference, a new data-driven cluster sampling algorithm and a novel featurepursuit procedure are introduced to speed up the convergence in search of optimal solution;in term of implementation, a video surveillance system is developed to combine allabove techniques to provide parsed video trajectories. These models and algorithms are extensively evaluated on several public benchmark datasets.

Keywords/Search Tags:

Computer Vision, Image/Video Parsing, Graph Embedding, NonnegativityAnalysis, Probabilistic Model, Sparse Coding

PDF Full Text Request

Related items

1	Visual Model Based On Sparse Coding And Its Applications
2	Research And Engineering Application Of Image Segmentation Based On Probabilistic Graph Model
3	Research On Graph Embedding Feature Selection Model Based On Sparse Constraints
4	Research On The Model Of Structure-augmented Knowledge Graph Embedding
5	Interpretable machine learning and sparse coding for computer vision
6	Application Of Graph Embedding Dimension Reduction Algorithm Based On Sparse Representation In Face Recognition
7	Graph-based Pattern Recognition And Its Applications In Computer Vision
8	The Image Classification Algorithm Based On Feature Extraction And Sparse Representation
9	Improved Non-negative Sparse Coding Image Classification Based On
10	Hyperspectral Remote Sensing Image Classification Via Sparse Graph Embedding