Font Size: a A A

Video Behavior Analysis Based On Deep Learning

Posted on:2021-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhuFull Text:PDF
GTID:2428330611496890Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the network,there are massive videos uploaded from surveillance,web cameras,and users on their own every day.Auditing and screening these videos by pure manual work requires a very large amount of work,which is almost impossible to complete.As a branch of computer vision,video behavior analysis has received a lot of attention from researchers and institutions because it has achieved certain research results and economic benefits in such areas as intelligent monitoring systems,video retrieval,and human-computer interaction.The traditional behavior recognition method is to identify the category of the behavior by designing and constructing a model representing the behavior,and analyzing the features extracted by artificial settings.However,the traditional human behavior recognition method based on artificial design features involves many links,has the disadvantages of large time overhead,and difficult to optimize the algorithm as a whole.In view of the above problems,this paper mainly researches and proposes a method of spatial temporal self-attention motion feature extraction based on deep learning.The main work is as follows:The method of current video behavior analysis is studied.In the traditional algorithm,the artificial feature extraction method,the design of the classifier and the mainstream i DT method in the traditional algorithm are studied.In the deep learning method,two-stream and TSN in the classic dual-stream method,C3 D in the 3D convolution method,etc.are studied.Research and design a spatial self-attention mechanism.It is difficult to extract key motion features for the complexity of video scenes.This paper proposes a3Dspatialself-attention mechanism.The attention distribution calculated by combining the motion information between the scene and the frame can focus on the effective motion area,and to a certain extent avoids the influence of irrelevant motion features.Factors such as differences in video shooting equipment and different video encoding methods may cause differences in different video frame rates,and the speed of the same action will also vary from person to person.Temporal 3D Res Ne Xt Blockis proposed for the above problems,The convolution in the time dimension uses a multi-scale convolution kernel,which increases the adaptability in the time span.In terms of model structure,this paper designs a pyramid structure to obtain the value matrix,query matrix,and key matrix of the spatial self-attention mechanism,so that the features of the input attention mechanism are more abundant.The experimental results on the UCF101 and HMDB51 data sets show that the method proposed in this paper can effectively improve the detection accuracy.
Keywords/Search Tags:deep learning, video behavior recognition, Self-Attention, feature fusion, two-stream network
PDF Full Text Request
Related items