Font Size: a A A

Temporal Action Localization And Action Recognition Based On Deep Learning

Posted on:2022-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2518306512962029Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video analysis and understanding is one of the basic tasks in the field of computer vision,mainly including action recognition and temporal action localization tasks.Temporal action localization needs to find out the start time and end time of the action in the unedited video,and generate the final action proposals.Action recognition is mainly for the generated action segment of video,through feature extraction and feature classification to determine the category of this video.Aiming at the unresolved problems in temporal action localization and action recognition,the models of temporal action localization and action recognition based on deep learning are studied respectively.The research content is as follows.(1)Temporal action localization model based on multi-granularity featuresAiming at the inaccurate localization caused by the existing temporal action localization model using only single granular information,a multi-granularity network is designed to generate high-quality temporal action proposals.The model first uses a trained two-stream convolutional neural network to extract the action features of each sampled segment,and then determines the start time and end time of the action based on the sensitivity of the action boundary to fine-granularity features.At the same time,the corresponding confidence score is generated.Then a multi-scale network is used to extract the multi-granularity features of the action,and at the same time,these features are used to determine the probability of the action center point under different granularities.And the accurate candidate proposal is generated by combining the action boundary probability under fine-granularity and the action center point probability under multi-granularity.Experiments on the dataset Activity Net-1.3 show that the average recall rate AR@100 is 75.62%,and the area under the curve AUC is 67.82%.The experimental results show that the proposed method has a better improvement than the original method,and is better than other existing temporal action localization algorithms.(2)Human action recognition model based on tightly coupled spatiotemporal two-stream convolutional neural networkIn consideration of the problems of low utilization rate of action information and insufficient attention of temporal information in video action recognition,a human action recognition model based on tightly coupled spatiotemporal two-stream convolutional neural network is proposed.Firstly,two 2D convolutional neural networks are used to extract spatial and temporal features in the video.Secondly,the forget gate module in the long short-term memory network is used to establish feature-level tightly coupled connections between each sampled segment to achieve information transfer.And then the bi-directional long short-term memory network is used to evaluate the importance of each sampled segment and assign adaptive weights to it.Finally,the spatiotemporal features are combined and used to complete human action recognition.The accuracy rates on the datasets UCF101 and HMDB51 are 94.6%and 70.9%,respectively.The experimental results shows that the proposed model can effectively improve the utilization rate of temporal information and the ability of overall motion representation,thus significantly improving the accuracy of human action recognition.
Keywords/Search Tags:Temporal action localization, Action recognition, Two-stream convolutional neural network, Spatiotemporal model, Feature fusion
PDF Full Text Request
Related items