Font Size: a A A

Exploring image and video by classification and clustering on global and local visual features

Posted on:2008-05-24Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Lu, LeFull Text:PDF
GTID:1448390005454208Subject:Artificial Intelligence
Abstract/Summary:
Images and Videos are complex 2-dimensional spatially correlated data patterns or 3-dimensional spatial-temporally correlated data volumes. Associating the correlations between visual data signals (acquired by imaging sensors) and high-level semantic human knowledge is the core challenging problem of supervised pattern recognition and computer vision. Finding the underlying correlations among large amounts of image or video data themselves is another unsupervised data self-structuring issue. From the previous literature and our own research work using computing machines as tools, there are a lot of efforts trying to address these two tasks statistically, by making good use of recently developed supervised (a.k.a. Classification) and Unsupervised (a.k.a. Clustering) statistical machine learning paradigms.; In this dissertation, we are interested on studying four specific computer vision problems involving unsupervised visual data partitioning, discriminative multiple-class classification and online adaptive appearance learning, using statistical machine learning techniques. Our four tasks are based on extracting both global and local visual appearance patterns in general image and video domains. First, we develop a new clustering algorithm to exploit temporal video structures into piecewise elements (a.k.a. video shot segmentation) by combining central and subspace constraints for a unified solution. The proposed algorithm is also demonstrated its applicability to illumination-invariant face clustering. Second, we detect and recognize the spatial-temporal video subvolumes as action units using a trained 3D-surface action model via multi-scale temporal searching, The dynamic 3D-surface based action model is built up as an empirical distribution over the basic static posture elements in the spirit of texton representation. Thus the action matching process is based on the similarity measurement among histograms. The basic posture units are considered as intermediate visual representations learned by a three-staged clustering algorithm figure-segmented image sequences. Third, we train a discriminative-probabilistic multi-modal density classifier to evaluate the responses of 20 semantic material classes from a large collection of challenging home photos. Then the task of learning photo categories is based on the global image features extracted from the material class-specific density response maps over spatial domain. We adopt the classifier combination technique of a set of random weak discriminators to handle the complex multi-modal photo-feature distributions in high dimensional parameter space. Fourth, we propose a unified nonparametric approach for three applications: location based dynamic template video tracking in low to medium resolution, segmentation based object-level image matching across viewpoints, and binary foreground/background segmentation tracking. The main contributions exist in three areas: (1) we demonstrate that an online classification framework allows very flexible image density matching function constructions to address the general data-driven classification problem; (2) we devise an effective dynamic appearance modeling algorithm requiring only simple nonparametric computations (mean, median, standard deviation) for easy implementation; (3) we present a random patch based computational representation for classifying image segments of object-specific matching and tracking which is highly descriptive and discriminative compared with general image segment descriptors. This proposed approach has been extensively demonstrated of being able to maintain an effective object-level appearance models quite robustly over time under a variety of challenging conditions, such as severe changing, occluding and deformable appearance templates and moving cameras.
Keywords/Search Tags:Image, Video, Classification, Clustering, Visual, Data, Appearance, Global
Related items