Font Size: a A A

Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions

Posted on:2012-07-29Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Rozgic, ViktorFull Text:PDF
GTID:1458390008990647Subject:Statistics
Abstract/Summary:
In the first part of this dissertation we present a class of sequential block sampling algorithms for tracking unknown and variable number of objects. Proposed algorithms are applicable to multi-object tracking scenarios in which only available observations are detector outputs, and also to scenarios where both detector outputs and more complex observations which figure in the data-association free likelihood models. Proposed algorithms provide a way to construct block proposal distributions using detection based observations. Key parts of the proposed algorithms are methods for sampling block proposal distributions. We propose two novel methods for this purpose, one is based on a variational approximation scheme and the other represents an adaptive MCMC sampling scheme. Samples from block proposal distributions are further used in the sequential MCMC (or SMC) framework. We tested proposed schemes on two synthetic datasets. Results demonstrate benefits of processing longer observation sequences in multi-object tracking problems in a more efficient manner that the classical sequential sampling schemes.;In the second part, we present a multi-target tracking algorithm for algorithm for tracking multiple speakers by a microphone array. The sound source trajectories reconstructed by by the mixture particle filter do not necessarily correspond to speech only. Therefore, we apply an adapted optimal change point algorithm to segment obtained sound source trajectories into speech and non-speech segments. The algorithm is tested on a multi-participant meeting database as a separate module and as a part of a multi-modal system for automatic meeting monitoring. In both cases it provided significant improvements on the speaker detection and segmentation tasks.;In the third part, we present a modality fusion algorithm that exploits complementary properties of video tracking, microphone array localization and speaker identification and solves the problem of speaker segmentation in presence of the overlapped speech. The proposed algorithm is unique from multiple perspectives. First, we suggest a hidden Markov model architecture that performs fusion of three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel likelihood model for the microphone array observations for dealing with overlapped speech. We propose a modification of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function that takes into the account possible microphone occlusions. We employ the multi-object detect-before-tracking approach and use the local maxima of the modified SPR-GCC-PHAT functions as sound source detectors. Multiple detection locations are fused into the joint likelihood by the joint probabilistic data association.;We present a new multi-modal database for analysis of participant behaviors in dyadic interactions. This database contains multiple channels with close- and far-field audio, a high definition camera array and motion capture data. Presence of the motion capture allows precise analysis of the body language low-level descriptors and its comparison with similar descriptors derived from video data. Data is manually labeled by multiple human annotators using psychology-informed guides. We analyzed relation between approach-avoidance (A-A) behavior and various non-verbal body language and acoustic features, and influence of the audio and video channels on experts' labeling decisions. Also we analyzed dependency of the statistical interaction descriptors and A-A labels on participants' roles.;At the end, we propose an ordinal regression (OR) algorithm and its extension applicable to time series for estimation the approach-and-avoidance (AA) behavior quantifiers (lables) in human dyadic interactions. The proposed algorithm transforms the ordinal regression to multiple binary classification problems, solves them by independent score-outputting classifiers and fits the cumulative logit logistic regression model with proportional odds (CLLRMP) the classifier score vectors. (Abstract shortened by UMI.)...
Keywords/Search Tags:Algorithm, Multi-object, Tracking, Block proposal distributions, Present, Microphone array, System, Human
Related items