Font Size: a A A

Research On Human Action Recognition And Analysis Based On Deep Learning

Posted on:2018-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:M Y FuFull Text:PDF
GTID:2348330536981946Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Video surveillance has played an important role in the security keeping of various places these years,meanwhile its range of application is still growing.However,due to the fact that traditional artificial video surveillance is facing problems including high cost,easily give rise to missing important information,low accuracy,low efficiency and so on,intelligent video surveillance which have great advantage in real time and initiative has become the new direction of development.The core problem to be dealt with in intelligent video surveillance is human action recognition and analysis in videos.In action recognition related field,the results of traditional recognition methods are strongly relied on the quality of features extracted artificially,which require complex computation and sufficient background knowledge of computer vision,while the features still have weak generalization ability.Convolutional neural networks,which applies deep learning methods,simulate the biological brain's processing of visual information,and the features of images are learned autonomously so as to realize feature extraction,greatly simplifying the traditional artificial feature extraction process.This paper mainly focuses on studying and designing the structure and feature fusion method of two stream deep convolutional neural network.In the structure design part,two separate streams,including spatial stream and temporal stream,are applied to simulate ventral stream and dorsal stream in actrual brain visual cortex,dealing with static and dynamic information respectively.Thus,static and dynamic features are separately extracted.In a specific stream neural network,a stack of convolutional layers with smaller receptive field is used instead of a single convolutional layer with larger receptive field,making the decision function more discriminative and reducing the number of parameters while keeping the same overal receptive field size.To deal with the over fitting problem,pre-train,dropout strategy and early stope strategy are applied.In the feature fusion method design part,three strategies,including video feature extraction first strategy,static and dynamic features fusion first strategy and direct video feature fusion strategy,are considered.Their performances are evaluated based on KTH action database results.Particularly,in the process of extracting video level features from single frame features,a weighted average method,taking features' dispersion into consideration,is introduced,with which improved the discrimination.Moreover,the recognition accuracies of two stream network and single stream network are compared,which verifies the superiority of the two stream structure.Finally,experiments are carried out on KTH action database and surveillance video data of CAVIAR project.As for KTH action database,deep network is trained with images and corresponding labels,thus a multiple classifier is obtained.Based on experimental result it is clear that features extracted from spatial stream and temporal stream have complementray property,thanks to which an average accuracy of 98.18% is reached finally,excelling most results of manual feature extraction methods.As for dataset made from surveillance video data of CAVIAR project,mirror transformation and random cropping are firstly used to augment the original data,then seveal separate detector networks are trained,and the detection of different actions are achieved based on sliding time window.The final average detection rate reaches 89.36%,verifying that two stream convolutional neural network are effective in surveillance video action recognition.
Keywords/Search Tags:action recognition, deep learning, convolutional neural networks, KTH action database
PDF Full Text Request
Related items