Font Size: a A A

Research On Action Recognition Based On Feature Fusion

Posted on:2019-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:B Y ChaoFull Text:PDF
GTID:2428330566967780Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Action recognition has been widely used in real life,such as intelligent monitoring,virtual reality,video retrieval,man-machine interaction,customer type,shopping behavior analysis,etc.,but clutter background,target shielding,lighting and camera viewpoint changes will all affect the performance of action recognition,so it is urgent to develop an action recognition algorithm.This paper mainly studies the action recognition algorithm from the aspect of feature fusion.The main works includes the following aspects:(1)Hand Crafted feature extraction:In the traditional methods of action recognition,dense trajectory's result is best.So this paper adopts the method of Hand-Crafted features of video sequences is described.The specific practices are follows,first calculate dense flow for continuous 15 frames,and extracte HOG,HOF,MBH on continuous trajectories.Then fuse this three features named Fh.(2)Deep learning CNN feature extraction:Because deep learning model can learn features by video samples which has a better advantage than traditional action recognition methods.So this paper adopted a kind of two-stream CNN network structure for the feature extraction of video sequences,two-stream convolution neural network refers to the space stream and time stream convolution neural network.The specific method is:firstly,the structure of GoogLeNet network is selected for parameter setting.Secondly,the RGB images and optical flow images of video sequence were fed into the space and time convolution neural network as input respectively for training.In order to make the model have the better generalization ability,the parameters of the network model are initialized with the pre-training model.In order to prevent overfitting,the number of samples is increased by angle cutting and size jitter.In order to make use of the local time series structure of video samples,video is segmented according to the sequence length on the time axis.Finally,the output of Global Pool layer in the two-stream network structure is extracted as two CNN features named Ft,Fsl of video sequence respectively.(3)CNN feature extraction based on Saliency image:Because of the reason of shooting and the visual attention mechanism of the human eye,human behavior mainly occurs in a prominent area in the video image,so a significant area is calculated,within which the significant area is Behavior description can eliminate the influence of background and describe behavior characteristics better.Therefore,the text uses the saliency detection method of video target segmentation to obtain the saliency map of the video,and the saliency map is sent to the spatial convolution network for training,resulting in significant spatial Figure convolutional network model,and then extract the output of the Global_Pool layer as a significant CNN feature F2 of the video sequence.(4)The method of action recognition based on the feature fusion:this paper fuses the Hand-Crafted feature and the deep learning CNN feature,the features of the two different modalities,useing the SVM to perform the classifier learning,and finally recognize the actions.Using data sets UCF101 method has carried on the experiments of this paper,First of all,spatial convolution neural network has two different results were 82.7%and 80.67%for input RGB and input Saliency.The result of the input Saliency is slightly worse than the result of the RGB input,because of the input Saliency in the process focused on the saliency area,So its reduces the amount of calculation,improve the training speed.In this experiment,two kinds of CNN features based on saliency spatial convolution network and temporal convolution network are fused Hand-crafted feature.The experimental results show that the accuracy rate of experimental results based on feature fusion method is 94.35%,which is improved by 12.35%compared with the accuracy rate based on the dense trajectory recognition method.The accuracy rate of the recognition method based on two-stream convolution neural network is 93.73%,which is an increase of 0.62%.Therefore,the feature fusion method in this paper can be applied to the field of action recognition.
Keywords/Search Tags:action recognitoin, two-stream convolution network, improved dense trajectory, saliency map, fusion feature
PDF Full Text Request
Related items