Font Size: a A A

Research On Human Action Recognition Based On Temperal Segment Network

Posted on:2019-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:W W TaoFull Text:PDF
GTID:2428330596960556Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Video-based action recognition has drawn considerable attention from the academic community nowadays,owing to its wide applications in many areas like video retrieval,surveillance and behavior analysis.The performance of action recognition system depends,to a large extent,on whether it is able to extract and utilize relevant information therefrom.However,extracting such information is non-trivial due to a number of difficulties,such as scale variations,view point changes,and camera motions.With the successful application of deep learning technology in the image field,many deep learning based methods have been extended to video-based action recognition.However,ublike the image,the temporal structure information contained in the video is very important for video-based action recognition.For long-range temporal modeling,this thesis focuses on three aspects of temporal segmentation network(TSN): untrimmed video processing,temporal information supplementation and optical flow real-time construction.The main work is as follows:1.Related work in video-based action recognition are reviewed.Several typical representation methods such as artificial-based features and deep learning based features are introduced respectively.Among them,temporal structure modeling is the focus of video-based action recognition.2.A well-known deep learning framework TSN for action recognition is studied with a multi-scale sliding window integration method for untrimed video prediction proposed.Based on the original two-stream convolutional neural network(CNN),TSN which can well express the long-term temporal structure information models long-range temporal structures with a new segment-based sampling and aggregation module.For the prediction of untrimmed video,a multi-scale sliding window integration method is proposed.Through multi-scale coverage and Top-K pooling,it is useful to position action and suppress the influence of background.Experimental results show that the proposed multi-scale sliding window integration method is effevtive for untrimmed video prediction.3.A four-stream TSN is proposed.In order to further understand the temporal relationships among video frames,a four-stream TSN based on dynamic image is proposed.Dynamic image,a novel compact representation of videos,encodes temporal data such as RGB or optical flow videos by using the concept of rank pooling.Based on two original streams static image and optical flow,two new streams dynamic image and dynamic optical flow are generated by dynamic image respectively.The experimental results show that the two newly added streams are complementary to the original two streams and can bring about an increase in accuracy.4.A real-time TSN(RT-TSN)is proposed.Due to the long computation time and storage requirement of optical flow,MotionNet based on unsupervised optical flow learning is introduced in this paper to generate optical flow.By stacking MotioNet in front of the temporal stream CNN in TSN,the resulting RT-TSN is computationally efficient and end-toend trainable.Experimental results show that RT-TSN is faster than TSN while maintaining similar accuracy.
Keywords/Search Tags:action recognition, temporal modeling, temperal segment network, dynamic image, optical flow learning
PDF Full Text Request
Related items