Font Size: a A A

Video Activity Recognition Based On Deep Learning

Posted on:2020-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y QianFull Text:PDF
GTID:2428330623959885Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Human activity recognition is a very important problem in computer vision,which has wide range of application scenarios.While recent advances such as deep learning have given us great results on image related tasks,it is still difficult to recognize behavior in videos due to a great deal of disturbance in videos.What' more,recognizing activity in untrimmed videos is more difficult because this task also need to localize the start frame and end frame of detected activity.An algorithm framework called DT-3DResNet-LSTM for temporal activity recognition is proposed.The overall algorithm framework consists of three parts.Here are the major works.1.Mask R-CNN is firstly used to detect objects in the video frame.Secondly,the detected object position information is input into the object tracking model,motion trajectory information of a plurality of different objects in successive frames is obtained.Finally,only the continuous video frames of the detected objects are input into the activity recognition module to identify and locate the video activity.2.The deep ResNeXt model is combined with LSTM to deal with activity recognition and localization problems.This ResNeXt model is pre-trained in the Kinetics dataset to better capture the characteristics of the input video.The feature is then entered into the LSTM network to find the actual temporal localization of the activity.The experimental results show that the combination of CNN and RNN can obtain more accurate activity classification and temporal localization results.3.Multi-category targets are tracked in the object tracking module.The proposed object tracking model ignores targets that are predicted to be the same object but have a long distance between frames,which can improve the accuracy of the object tracking module.At the same time,the IoU(Intersection over Union)is calculated between the object tracking prediction bounding box and the object detection bounding box to obtain the object type of the tracking object.Comparative experiments show that DT-3DResNet-LSTM can effectively improve the performance of behavior recognition and positioning problems.On the one hand,the proposed method has higher average accuracy and can more accurately identify specific behaviors in the video.On the other hand,the proposed algorithm framework has a lower rate of missed detection than other methods of identification,and can detect and identify the specified behavior more comprehensively.
Keywords/Search Tags:Activity recognition, LSTM, ResNet, object detection
PDF Full Text Request
Related items