Font Size: a A A

Invariance in human action analysis

Posted on:2004-10-16Degree:Ph.DType:Thesis
University:University of Central FloridaCandidate:Rao, CenFull Text:PDF
GTID:2468390011473890Subject:Computer Science
Abstract/Summary:
Recognition of human actions from video sequences is an active area of research in computer vision. Possible applications of recognizing human actions include video surveillance and monitoring, human-computer interfaces, model-based compression and augmented reality. The motion of an object can be captured by its trajectory. Analysis of human perception of motion shows that information for representing the motion is obtained from changes in the speed and direction of the trajectory. In this dissertation, we propose a computational representation of human action to capture these changes using spatio-temporal curvature of 2-D trajectories. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful action units called “dynamic instants” and “intervals”. A dynamic instant is an instantaneous entity that occurs for only one frame, and represents an important change in the motion characteristics of the action agent. An interval represents the time period between two dynamic instants during which the action agent's motion characteristics do not change. Starting without a model, we use this representation for recognition and incremental learning of human actions. The Dynamic Time Warping matching is employed to match trajectories of actions using a view invariant similarity measure. The nearest-neighbor clustering approach is used to learn human actions without any training. The proposed method can discover instances of the same action performed by different people from different viewpoints. Our approach heavily uses the properties of 3D epipolar geometry and employs rank constraints in matching 2-D projections of a 3-D action in order to eliminate the distortion due to this projection, without explicitly constructing the 3-D trajectory. We also propose the use of a rank constraint on the fundamental matrix for spatio-temporal alignment of video sequences. This rank constraint is more robust and does not require actual computation of the fundamental matrix. Therefore it is easier to compute than the previous fundamental matrix based approaches. We propose a dynamic programming approach using the rank constraint to find the nonlinear time-warping function for videos containing human activities. In this way, videos of different individuals taken at different times and from distinct viewpoints can be synchronized. Moreover, a temporal pyramid of trajectories is applied to improve the accuracy of the view-invariant dynamic time warping approach. We show various applications of this approach, such as video synthesis, human action recognition and computer aided training. Compared to the state-of-the-art techniques, our method shows a great improvement. This dissertation makes two fundamental contributions to view invariant action recognition: (1) A view-invariant representation of action trajectories based on Dynamic Instant detection. (2) View-invariant Dynamic Time Warping to measure the similarity between two trajectories. We have successfully applied the view-invariant spatio-temporal information of the action trajectories for both action recognition and video synchronization, without explicitly reconstructing 3D information.
Keywords/Search Tags:Action, Human, Video, Recognition, Trajectories, Dynamic time warping, View-invariant
Related items