Font Size: a A A

Action Classification And Recognition Of Surveillance Videos In Financial Field

Posted on:2020-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y L PanFull Text:PDF
GTID:2428330572971197Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Automatically recognizing action by modeling videos is a hot research task in video modeling field.With the development of technology in the last few years,machines are replacing human operations in some application scenarios.The demand for manpower has been greatly reduced by taking advantage of the tirelessness and the high precision of machine,besides,it also improves the efficiency of execution and promotes the development of economy.However,the development of automation technology also derives many criminal behaviors because of the lack of supervision.This project studies the video modeling by using deep learning technology.Traditional methods mostly combine different feature descriptors of video as the final feature vector,then implement the video classification by inputing it into a classifier.This kind of method needs much time to do feature extraction job.Moreover,the extracted features are of poor generality and can only be used to identify some specific actions.Once some action performs a relatively large change,the recognition accuracy of this kind of method will be greatly reduced.This project focuses on modeling video using deep neural network.Deep neural network learns to represent videos automatically under the support of large number of training samples,which improves the efficiency and accuracy of action recognition.On the basis of reviewing the current action recognition literature at home and abroad,a new network architecture which named "motion enhanced semi-3d network" is proposed,which pays attention to encoding the motion information in videos.It greatly improves the recognition accuracy of actions in financial scenario.The contributions of this paper are as follows:Firstly,the financial security surveillance videos set has been filmed and constructed which can be used for neural network training.To standardize the action mode and speed up the shooting process,the research scope of this project is limited to using deep learning algorithm to make dichotomous judgment of violence or not for behaviors in video,and the content of surveillance video is limited to the action interaction scenarios that may occur around automated teller machine(ATM).This paper proposes a set of template actions,based on which a large number of-surveillance video data are filmed.Because there is noise in the original videos and the amount of data is not enough to support the training of neural network,the original video needs to go through subsequent manual cutting,and other pre-processing processes to achieve the purpose of noise removal and data amplification.In addition,the optical flow information of video is required as input for the training of some neural network models,so the preprocessing process also includes video framing and optical flow calculation.After the complete video pre-processing process,this project constructs the data set containing 1979 surveillance video samples under the top view Angle,which provides strong support for the subsequent test and comparison of the performance of different deep learning algorithms.This project has built six existing neural network models,which respectively are C3D,BN-Inception,3D-Resnet,Two-Stream,ST-Multipler and TSN.Then their recognition accuracy is tested and compared on the above surveillance video dataset.Due to the small amount of data,a variety of regularization methods were added and tested to alleviate the overfitting symptoms of the model.At the same time,some specific hyperparameters in different models are optimized to obtain the best recognition accuracy.After experimental comparison,it is found that the performance of the 3D convolutional network(C3D,BN-Inception,3D-Resnet)and the the two-stream convolutional network(Two-Stream,ST-Multipler,TSN)has both advantages and disadvantages in action recognition of surveillance videos,so there is still room for improvement.At present,there are few researches on action recognition of surveillance videos of financial scenario.This project focuses on modeling the motion information in videos.Combining the advantages of 3D convolutional network and two-stream convolutional network,a semi-3 d motion enhancement network structure for action recognition of surveillance videos of financial scenario is innovatively proposed.Experimental results show that the semi-3d network achieves 99.21%accuracy in monitoring the video data set,which is higher than the results obtained by the 3D convolutional neural network and the two-stream convolutional neural network,and has strong robustness.It provides ideas for the further improvement of the algorithm in this field and has practical significance.
Keywords/Search Tags:financial scenario, action recognition, deep neural network, Motion-enhanced Semi-3D neural network
PDF Full Text Request
Related items