Fusion Of Multiple Visual Objects For Action Recognition

Posted on:2016-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:J Liu

Full Text:PDF

GTID:2308330476954982

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Action recognition is a highly active research in the domain of computer vision and pattern recognition, and has a multitude of applications, such as in surveillance, virtual reality, human-computer interaction, etc. However, recognizing actions in realistic videos from unconstrained environments still remains a challenging problem due to the large appearance variations of human bodies, background clutter and camera movement. In realistic environment, object and scene can provide rich source of contextual information for analyzing human actions, as human actions often occur under particular scene settings with certain related objects. Therefore, this paper tries to utilize the contextual object and scene for improving the performance of action recognition.This paper proposes a method of fusing multiple visual objects, modeling the relationship of action, object and scene. Specifically, a latent structural SVM is introduced to build the co-occurrence relationship among action, object and scene, in which the object class label and scene class label are treated as latent variables. In this framework, action class labels, object class labels as well as scene class labels can be predicted, and the object location can be simultaneously estimated as a by-product. Experimental results demonstrate the effectiveness of the proposed method for improving the performance of action recognition.Moreover, this paper propose to train the pre-learned classifies for mid-level feature using transfer learning, as a mid-level discriminative feature is utilized to describe the information of visual object. The mid-level class correlation feature is actually a set of decision values from the pre-learned classifiers of all the classes, measuring the likelihood that the input video belongs to the corresponding class. To train the pre-learned classifiers, this paper proposes a transfer learning method from images to videos, as the objects and scene is blur in the limited video training data, and labeling training samples is time consuming and labor expensive. Specifically, the labeled Web images are used to train the initial classifiers, and the unsupervised domain adaptation method is utilized to solve the difference between source image domain and target video domain. Experimental results demonstrate the discrimination of the mid-level feature and the effectiveness of the transfer learning method.

Keywords/Search Tags:

action recognition, context modeling, latent structural SVM, mid-level feature, transfer learning

PDF Full Text Request

Related items

1	Research Based On Local Spatiotemporal Features And Parts For Human Action Recognition From Videos
2	Analyzing And Understanding Human Actions In Videos
3	Human Action Recognition And Suspected Cheating Behavior Detection Based On Latent SVM
4	Joint-based Feature Fusion For Human Action Recognition
5	Research On Action Recognition Method Based On Feature Representation
6	Research On Cross-domain Action Recognition Via Transfer Learning
7	Research On Video Action Recognition Based On Transfer Learning
8	Research On Action Recognition Algorithm Based On Mid-level Network Structure
9	Research On Image-based Action Recognition Based On Context And Feature Fusion
10	Deep Transfer Learning For Action Recognition