Font Size: a A A

Research On Video Motion Detection Techniques With 3D Convolution And Faster RCNN Neural Networks

Posted on:2019-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Q NieFull Text:PDF
GTID:2428330593450442Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The goal of video motion detection is to identify the action category in a piece of video content,and also to predict the time boundary of the action and the corresponding spatial position.With the convolutional neural network has made a major breakthrough in the field of image classification and identification,many researchers began to introduce convolutional neural networks into the field of video motion detection.The video motion detection algorithm has a broad application prospect in video surveillance analysis and smart home care.Timing action detection is an important task in video motion detection.It not only identifies the type of action within the video,but also indicates the precise time period during which the action occurred.At present,the sequential motion detection with good detection effect converts the video into images and then performs classification detection.These algorithms ignore the correlation between the video frames and the frames,thus resulting in the detection accuracy of the time sequence actions being less than 20%.For this problem,the temporal motion detection in this paper firstly uses three-dimensional convolutional neural network to extract the spatio-temporal characteristics of video actions.The spatio-temporal characteristics of this stage will be used as a time-space candidate frame network and classification network as a shared convolution.Then the time candidate frame network draws on the RPN network in the Faster RCNN to generate variable long-term segment candidate frames.At this stage,the time candidate network uses the RPN network to generate k anchor frames at each time position of the shared convolution map,and the anchor frame is a predefined multiscale window centered on each time position.At the same time,the temporal candidate frame network not only can predict whether each generated temporal candidate frame is an action or a background,but also can predict the relative displacement values of the center position and the length of the temporal candidate frame and the video realtime candidate frame.The temporal candidate frame finally output by the time candidate frame network eliminates the high overlap and low confidence time candidate boxes according to the non-maximum suppression strategy,and links anchor frames with the same time scale.Finally,the classification network uses 3D RoI pooling to get a fixed-size temporal candidate frame,then classifies the temporal candidate frame into a specific action category,and further optimizes the temporal boundary of the temporal candidate frame.The results of sequential action detection in this paper are 6.8% higher than those of Shou,and the detection speed is increased by more than 4 times times.Motion positioning is another important task in video motion detection.Unlike timing motion detection,it is not detecting the time boundary of video motion,but detecting the spatial position of video motion.Because the current undivided dataset lacks spatial location annotation information,the motion localization in this paper will use a weak supervision method to generate the spatial location border of the action.First replace the fully connected layer in the Faster RCNN network structure with the convolution layer and use the global average pooling to obtain different types of action activation maps.Then use the threshold method to get the general outline of the action and refine the action outline.Then according to the motion contour,an external rectangular border corresponding to the contour is generated,and finally these external rectangular borders are sorted and combined to obtain the final external rectangular border of the motion,thereby realizing the spatial position of the motion.The method of motion localization in this paper is the first method to realize the position of motion space on the undivided data set.
Keywords/Search Tags:Video Motion Detection, Faster RCNN, 3D Convolution, Global Average Pooling
PDF Full Text Request
Related items