Font Size: a A A

Video Object Detection Based On Deep Feature Flow Learning And Selective Attention Mechanism

Posted on:2021-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q C SunFull Text:PDF
GTID:2518306050973459Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the increasing development of deep learning,people pay more and more attention to the field of video object detection.Although video object detection alrorithm has become a hot research topic and made some breakthroughs,there are still some problems such as low detection accuracy,poor detection effect on fuzzy objects and inaccurate location of occlusion objects.In this paper,the selective attention mechanism,adaptive key video frame extraction method and distance intersection over union regression loss function are applied to the video object detection task to improve the accuracy of detection.The main work content and innovations of this paper are as follows: 1.In view of the problem that the attention of each channel in the feature map is same in video object detection,resulting in the low detection accuracy of objects,this paper improves the residual feature extraction network based on the deep feature flow video object detection algorithm and proposes the RECA-Net channel attention network model.By using the channel attention mechanism block,local cross-channel information interaction is realized,and the significance of the object feature channel in the feature map is enhanced,so that the model can capture the important information in the video frame during the training process,thereby improving the accuracy of model detection.Experiments on the public ILSVRC 2017 VID dataset show that the detection accuracy of RECA-Net is higher than that of DFF,FGFA and TPN video object detection algorithms.2.In view of the problem that the object may be unclear and the motion amplitude is large in the video frame images,which leads to the unreasonable selection of key video frames,this paper proposes an adaptive selection method of key video frames.Firstly,this method calculates the sharpness of the current video frame image by the Laplacian gradient function,and then uses the optical flow network to calculate the total displacement of all pixel pisitions between the current video frame and its nearst key video frame to determine whether the object is in violent motion.Only when the current video frame is clear enough and the object is moving violently,can this video frame be used as a new key video frame.This key video frame selection method not only guarantees the quality of key video frames,but also improves the accuracy of object detection in the video to a certain extent.Experiments on the ILSVRC 2017 VID dataset show that the video object detection algorithm based on adaptive key video frame extraction has achieved higher detection accuracy than DFF,FPFA,TPN and RECA-Net algorithms.3.In order to solve the problem that the bounding box regression is not stable during the network model training process,which leads to the divergence of the network model,this paper improves the regression loss function in the video object detection network model.The distance intersection over union regression loss function is using as the loss function of bounding box regression part,the improved network model regression loss function takes the overlapping area between the bounding box and the real labeling box,center points,the overlapping rate and so on into account,so that the regression of the bounding box is more stable and effective,which further improves the accuracy of the video object detection algorithm.Through experiments on the ILSVRC 2017 VID dataset,it is proved that training the model by using the distance intersection over union regression loss function can improve the detection ability and generalization ability of the model.Compared with video object detection algorithms based on DFF,FGFA,TPN,RECA-Net and adaptive key video frame extraction,the video object detection algorithm based on distance intersection over union regression loss function has achieved higher detection accuracy.
Keywords/Search Tags:Video object detection, Channel attention, Key video frame, Loss function
PDF Full Text Request
Related items