Exploring image and video by classification and clustering on global and local visual features

Posted on:2008-05-24

Degree:Ph.D

Type:Dissertation

University:The Johns Hopkins University

Candidate:Lu, Le

Full Text:PDF

GTID:1448390005454208

Subject:Artificial Intelligence

Abstract/Summary:

Images and Videos are complex 2-dimensional spatially correlated data patterns or 3-dimensional spatial-temporally correlated data volumes. Associating the correlations between visual data signals (acquired by imaging sensors) and high-level semantic human knowledge is the core challenging problem of supervised pattern recognition and computer vision. Finding the underlying correlations among large amounts of image or video data themselves is another unsupervised data self-structuring issue. From the previous literature and our own research work using computing machines as tools, there are a lot of efforts trying to address these two tasks statistically, by making good use of recently developed supervised (a.k.a. Classification) and Unsupervised (a.k.a. Clustering) statistical machine learning paradigms.; In this dissertation, we are interested on studying four specific computer vision problems involving unsupervised visual data partitioning, discriminative multiple-class classification and online adaptive appearance learning, using statistical machine learning techniques. Our four tasks are based on extracting both global and local visual appearance patterns in general image and video domains. First, we develop a new clustering algorithm to exploit temporal video structures into piecewise elements (a.k.a. video shot segmentation) by combining central and subspace constraints for a unified solution. The proposed algorithm is also demonstrated its applicability to illumination-invariant face clustering. Second, we detect and recognize the spatial-temporal video subvolumes as action units using a trained 3D-surface action model via multi-scale temporal searching, The dynamic 3D-surface based action model is built up as an empirical distribution over the basic static posture elements in the spirit of texton representation. Thus the action matching process is based on the similarity measurement among histograms. The basic posture units are considered as intermediate visual representations learned by a three-staged clustering algorithm figure-segmented image sequences. Third, we train a discriminative-probabilistic multi-modal density classifier to evaluate the responses of 20 semantic material classes from a large collection of challenging home photos. Then the task of learning photo categories is based on the global image features extracted from the material class-specific density response maps over spatial domain. We adopt the classifier combination technique of a set of random weak discriminators to handle the complex multi-modal photo-feature distributions in high dimensional parameter space. Fourth, we propose a unified nonparametric approach for three applications: location based dynamic template video tracking in low to medium resolution, segmentation based object-level image matching across viewpoints, and binary foreground/background segmentation tracking. The main contributions exist in three areas: (1) we demonstrate that an online classification framework allows very flexible image density matching function constructions to address the general data-driven classification problem; (2) we devise an effective dynamic appearance modeling algorithm requiring only simple nonparametric computations (mean, median, standard deviation) for easy implementation; (3) we present a random patch based computational representation for classifying image segments of object-specific matching and tracking which is highly descriptive and discriminative compared with general image segment descriptors. This proposed approach has been extensively demonstrated of being able to maintain an effective object-level appearance models quite robustly over time under a variety of challenging conditions, such as severe changing, occluding and deformable appearance templates and moving cameras.

Keywords/Search Tags:

Image, Video, Classification, Clustering, Visual, Data, Appearance, Global

Related items

1	Human appearance modeling in visual surveillance
2	Image Classification And Segmentation Algorithm Based On Clustering
3	The Research Of Tracking Algorithm Of Moving Object Based On Video Image Sequences
4	Researches On Efficient Hashing For Video Retrieval
5	Key Technologies Of The Final Appearance Of The Pcb Inspection Machines
6	The Research And Application Of Large-Scale Image Classification And Robust Subspace Clustering Algorithm For Big Data
7	Multidimensional Data Clustering Algorithm Research And GPU Acceleration Based Global K-means
8	The Content-analysis Based Image And Video Coding
9	Statistical shape and appearance models for segmentation and classification
10	Multivariate IB Clustering Algorithm Based On Image Visual Context