Font Size: a A A

Enabling Technologies Of Real-world Video Action Recognition

Posted on:2020-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q X ZhaoFull Text:PDF
GTID:2428330620456151Subject:Information and communication engineering
Abstract/Summary:PDF Full Text Request
Video action recognition tasks are popular research content in the field of computer vision.The video action recognition task focuses on how to maximize the extraction and analysis of image information and motion information in video.In recent years,the mainstream direction of video action recognition tasks has been the use of deep learning methods,and has produced many excellent methods and network architecture.Since the long-distance dependence information in video is indispensable,this paper focuses on the non-local(NL)module in the 3D convolutional network and Efficient Convolutional Network for Online Video Understanding(ECO),as well as anomaly detection based on non-local network.The main work of this paper is as follows:Firstly,a(2+1)D network based on non-local operation is proposed.Non-local Inflated 3D ConvNet(NL-I3D)is an advanced network structure for action recognition.Non-local operations are presented as a generic family of building blocks for capturing long-range dependencies,inspired by the classical non-local means method.To optimize the structure of the NL-I3 D network,the 3D convolution is replaced by the(2+1)D convolution,and the NL-R(2+1)D network is proposed.The experimental results show that the NL-R(2+1)D network can effectively acquire non-local information.Secondly,efficient online video understanding networks with non-local information fusion are proposed,to meet the requirement of real time recognition.The efficient online video understanding network divides and samples the input video,learns the appearance characteristics of the single frame in the 2D convolution network,and then applies the 3D convolution network to learn the inter-frame relationship.Based on the ECO network,this paper uses NL module combined with 3D convolution to learn the inter-frame relationship,especially the long-distance relationship,which effectively improves the accuracy of the network and maintains the real-time demand of the network.Thirdly,abnormal detection methods based on multi-example learning and non-local network are proposed.Due to strong data sparsity and high diversity within the anomaly detection,a multi-instance learning framework is adopted for abnormal decision.The methods first subcontract and segment the video,and perform feature extraction and anomaly scoring on the video segments in the positive and negative instances.The experimental results show that the multi-instance anomaly detection methods based on non-local network can complete the abnormal behavior detection task.
Keywords/Search Tags:Video Action Recognition, Non-local Network, Abnormal Detection, Temporal Modeling
PDF Full Text Request
Related items