Font Size: a A A

Human Action Recognition Based On Multi-level Feature Fusion

Posted on:2017-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:2308330503983629Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Human action recognition based on videos is a hot topic in the field of computer vision, and it has extensive applications and potential economic values in the area of human computer interaction, video analysis and society public security. The main task of action recognition is to process and analyze the original image sequences, to learn and understand human action or behavior.There are two main problems in human action recognition: action representation and action classification. The purpose of action representation is to describe action by extracting valid features from the video, and action classification aims to build classification model using extracted valid features. Since the differences of action representation, existing recognition methods can be summarized as based human model methods, global feature methods and local features methods. Local features are a more popular way for representing human actions, which achieve the advanced results for action recognition when combined with a bag-of-features description.The crucial issues of action recognition are video feature extraction and description, which will affect the results of the action classification significantly. Therefore, in this dissertation we are focusing on feature extraction and description. The cardinal work includes human action recognition based on multi-level feature fusion and human action recognition based on skeleton.In the work of human action recognition based on multi-level feature fusion, an efficient multi-level feature fusion descriptor for human action recognition is introduced in the paper. The descriptor is built by the low-level features, which include three trajectory features, HOF and SIFT, combination with the mid-level class correlation feature.In the stage of low-level feature extraction, trajectory, HOF and SIFT are extracted. Inspired by the recent popularity of dense trajectories in image recognition, they have been utilized to represent actions. Dense trajectories are gained by tracking densely sampled points using optical flow fields. Since HOG(histograms of oriented gradients) and MBH(motion boundary histograms) are the effective descriptors on a variety of datasets, they will be used to describe our dense trajectories. Except for trajectory features, HOF descriptor and SIFT feature are also extracted to describe action comprehensively.Due to different action classes may often share similar motion patterns of a part of the body and such class correlations among different action classes can be used to distinguish different actions. Therefore, probabilities that the video belongs to each action class are directly applied to represent the mid-level class correlation feature.In the stage of feature fusion, the method of feature fusion is used to fuse the above-mentioned features.In the phase of human action recognition based on skeleton, background variation is so less that objects can be extracted effectively by employing the algorithm of background difference. Aiming to gain skeleton, the method of image thinning has been used. After extracting skeleton, Hu moments are computed to represent skeleton feature. Bag-of-features is also utilized to describe the skeleton feature.Experiments are carried out in the standard UCF Sports、KTH and CASIA video datasets, and the feasibility and validity of the proposed method in this dissertation is verified in experiments.
Keywords/Search Tags:Human Action Recognition, Multi-level Feature, Bag-of-Words, Extraction and Description of Skeleton
PDF Full Text Request
Related items