Font Size: a A A

Fusion Of Multiple Visual Objects For Action Recognition

Posted on:2016-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2308330476954982Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Action recognition is a highly active research in the domain of computer vision and pattern recognition, and has a multitude of applications, such as in surveillance, virtual reality, human-computer interaction, etc. However, recognizing actions in realistic videos from unconstrained environments still remains a challenging problem due to the large appearance variations of human bodies, background clutter and camera movement. In realistic environment, object and scene can provide rich source of contextual information for analyzing human actions, as human actions often occur under particular scene settings with certain related objects. Therefore, this paper tries to utilize the contextual object and scene for improving the performance of action recognition.This paper proposes a method of fusing multiple visual objects, modeling the relationship of action, object and scene. Specifically, a latent structural SVM is introduced to build the co-occurrence relationship among action, object and scene, in which the object class label and scene class label are treated as latent variables. In this framework, action class labels, object class labels as well as scene class labels can be predicted, and the object location can be simultaneously estimated as a by-product. Experimental results demonstrate the effectiveness of the proposed method for improving the performance of action recognition.Moreover, this paper propose to train the pre-learned classifies for mid-level feature using transfer learning, as a mid-level discriminative feature is utilized to describe the information of visual object. The mid-level class correlation feature is actually a set of decision values from the pre-learned classifiers of all the classes, measuring the likelihood that the input video belongs to the corresponding class. To train the pre-learned classifiers, this paper proposes a transfer learning method from images to videos, as the objects and scene is blur in the limited video training data, and labeling training samples is time consuming and labor expensive. Specifically, the labeled Web images are used to train the initial classifiers, and the unsupervised domain adaptation method is utilized to solve the difference between source image domain and target video domain. Experimental results demonstrate the discrimination of the mid-level feature and the effectiveness of the transfer learning method.
Keywords/Search Tags:action recognition, context modeling, latent structural SVM, mid-level feature, transfer learning
PDF Full Text Request
Related items