Font Size: a A A

Research On Video Action Recognition Based On Deep Learning

Posted on:2021-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:W Y LuoFull Text:PDF
GTID:2428330623468344Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of 5G network and the Internet industry,there are massive amounts of video data producing at every moment in the fields of intelligent video surveillance,self-driving car,smart connected home,emerging micro-video and so on.How to better use computer to understand and identify action in videos to assist decision-making has become a major issue in related industries.Compared with still images,video contains not only spatial scene information,but also temporal context information,which gives greater challenges to video action recognition.Based on the popular deep learning technology,our work deeply researches and improves the current algorithms.we recognize the multi-scale problem in both spatial and temporal field in video action recognition,which means different videos may have different subject scale sizes in the spatial dimension and different action durations and frequencies in the time dimension.We combines some related processing methods in the image field and further reflections on it and poses a method which firstly decomposes the 3D convolution into separate spatial and time convolution modules,and then further expanding the two modules into a spatial-temporal multi-scale module composed of parallel multi-scale convolution kernels,which aims to extract richer scale information features.Next,we propose an attention module in the three domains of feature channel,space,and time,which aims to enhance the performance of features in important areas in the three domains so that the network can train better.In the experimental part,we integrated the two major structures proposed in this thesis into the 3D convolutional network architecture,and we do a series of comparative experiments on UCF101 video dataset.The proposed algorithm achieved excellent performance without adding optical-flow and pretraining on large dataset.Finally,we also conducted a further experiment and we find the accuracy of the motion category videos which exist more multi-scale problems in both spatial and time field is significantly improved compared with the original 3D convolutional network,which fully verifies the effectiveness of the proposed architecture.
Keywords/Search Tags:action recognition, deep learning, 3D convolution, multi-scale, attention
PDF Full Text Request
Related items