Font Size: a A A

Joint-based Feature Fusion For Human Action Recognition

Posted on:2016-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:W J WuFull Text:PDF
GTID:2348330488955686Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Human action recognition from videos is an extremely challenging task in the field of computer vision and it has gained more and more interest due to its wide applications such as intelligent surveillance, human-computer interaction, content-based video retrieval and so on. This indicates that human action recognition technology is closely related to various aspects of our daily life. Meanwhile, some related research achievements are dramatically changing the lifestyle. Therefore, it is of great realistic significance to conduct research on human action recognition.Early work mainly focused on extracting motion-related features from video data, then a classifier learnt from training set was used to recognize different human actions. Obviously,these methods ignored the most intrinsic information of human action, i.e. the inherent correlations between human joints and motions. To better utilize this information, in this paper, we adopt two ways of feature fusing to exploit joint information and recognize human actions with multi-task learning and multiple kernel learning framework, respectively. The main work in this paper is listed as follow:(1) A human action recognition method based on human joints and multi-task sparse learning is proposed. The purpose of multi-task sparse learning is to tap the intrinsic relationship among human joints and realize joints feature fusion. We use the joint-based covariance matrix over time as a discriminative descriptor for each human joint. Meanwhile, to encode the temporal information of the moving joints, multiple covariance matrices are deployed over sub-sequences that are different in temporal granularity. After that, we input the descriptor into multi-task sparse learning framework to obtain a more compact and discriminative action representation. Experiments verify this algorithm, and the recognition performance is indeed improved.(2) We propose a method for recognizing human actions based on human pose and context information. The original human pose feature is extracted in every frame and it contains the velocity and angular velocity of each human joint to encode the motion information. To obtain a fixed-length video descriptor, we proceed with the max aggregation of pose feature over temporal hierarchical construction. Furthermore, since the existence of a strong corre-lation between human motion and its surroundings, we extract another feature, referred to the context feature, to describe this valuable information. At last, a multiple kernel learning method is adopted for combining the two heterogeneous features to achieve a better feature representation. Experiments show the proposed method can indeed achieve better performance.
Keywords/Search Tags:Action Recognition, Feature Fusion, Multi-task Sparse Learning, Context Information, Multiple Kernel Learning
PDF Full Text Request
Related items