Font Size: a A A

Research On Video Behavior Classification Technology Based On Spatio-Temporal Features

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:R SongFull Text:PDF
GTID:2428330626455885Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the popularization of 4G and 5G technologies and the rapid development of the mobile Internet,people's information interaction has become more rapid.Video has gradually become an indispensable part of people's lives.However,with the exponential increase in the number of videos,classifying,monitoring,and disseminating video content is a major problem that needs to be solved urgently.Deep learning has achieved great results in the field of computer vision.It has achieved unparalleled effects in traditional visual application scenarios such as object detection and picture classification,and has made breakthrough progress.However,in the field of video classification,since deep learning requires feature extraction in both spatial and temporal dimensions to model video behavior,it greatly affects classification accuracy.At the same time,most of the current models use optical flow as an input in the time dimension,which greatly affects the speed of the model.Aiming at the problem of slow optical flow calculation speed in the model,the Motion Net model is used to calculate optical flow,and a new spatio-temporal feature fusion structure is proposed for the structure of Motion Net,which can greatly improve the model speed while ensuring the accuracy.To address the issue of appeal,this thesis combines the spatial and temporal features of the video,and introduces the OFF sub-network into the traditional processing spatio-temporal feature network to improve the accuracy of model classification.At the same time,the network model proposed in this thesis is applied to an important task in video behavior classification-abnormal behavior detection,and the characteristics of abnormal behavior are optimized to improve the accuracy.The work done in this thesis is summarized as follows:(1)From the perspective of improving the model speed,the key to limit the speed of the model at this stage is the calculation method of optical flow,we propose to use the Motion Net network instead of the traditional method of calculating optical flow,combining Motion Net network and traditional optical flow feature extraction network into an end-to-end model.The experimental results prove that the model speed can be greatly improved,with the accuracy rate being guaranteed,from the original 14 fps to 140fps;(2)From the perspective of improving the accuracy of the model,in order to make full use of the spatial and temporal features,we have added a spatio-temporal feature fusion structure to fuse the features containing spatial information and temporal features in the Motion Net network to improve the model's accuracy.And in order to extract more temporal features from the optical flow,we added the OFF sub-network to the optical flow extraction feature network,and input each layer of features into the OFF sub-network to further extract the temporal features.Finally,our model achieved state-of-the-art accuracy on four data sets,UCF-101,HMDB51,MSR Daily Activity3 D and Florence 3D action.(3)In the detection of abnormal behavior,to improve the recognition rate of small actions in abnormal behavior detection,we propose to use the DIFF stream to extract the corresponding features.And according to its behavior characteristics,we designed a new spatio-temporal feature fusion structure to improve the accuracy of the network.Finally,multiple channels are weighted and fused,and the best weight value is selected for abnormal behavior recognition.In the end,our model can achieve 98.52% accuracy in detecting abnormal behavior.
Keywords/Search Tags:Real-time video classification, Deep learning, Spatio-Temporal Features, Optical flow, Convolutional neural network
PDF Full Text Request
Related items