Font Size: a A A

Research On Video Object Detection Based On Feature Propagation And Fusion

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2428330614468313Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Object detection recognizes and locates objects,playing an important role in basic modeling of scenes in visual application systems such as video surveillance,intelligent robots,and autonomous driving.Due to the introduction of deep learning,the performance of object detection algorithms based on static images has greatly improved.However,in actual application scenarios,the input of the visual system is usually a continuous video,rather than an independent image.Applying the static image object detection algorithm directly to each frame of the video will cause the following problems:(1)The video image quality is often significantly reduced due to motion blur,scale changes,object occlusion,etc.,and a single image cannot provide accurate information of the target;(2)The vision system requires real-time performance,if feature calculation is performed for each frame,the algorithm cannot run at the required speed;(3)The video image has spatiotemporal consistency,and independent detection for each frame does not make full use of timing information.This thesis studies video object detection algorithms based on feature propagation and fusion.Among them,the feature fusion module improves the detection accuracy by modeling the spatiotemporal relationship of the target;and the feature propagation module improves the detection speed by utilizing the redundancy of adjacent images.The main work content and innovative achievements of this thesis include the following three aspects:1.A video object detection algorithm based on recurrent neural network is proposed.The recurrent neural network has a memory retention function,which can extract video timing information.This algorithm uses the improved recurrent neural unit for the transfer and aggregation of features between image frames to improve the feature expression of the current frame.In order to mitigate the impact of the motion offset on the aggregation module,optical flow is used to align features between image frames.Optical flow calculation is integrated into the network,thus the algorithm framework maintains an end-to-end structure.This algorithm meets the actual system requirements of causality.Experiments on the public data set Image Net VID show that this algorithm effectively improves the accuracy of video object detection,and can achieve better or similar performance even when compared with non-causal algorithms.2.A video object detection algorithm based on self-attention mechanism is proposed.Aiming at the short-range dependence of the recurrent neural network,this algorithm uses the self-attention mechanism instead to construct a feature aggregation module.The self-attention mechanism has the ability to directly capture short-range and long-range information,and can increase the parallelism of the calculation.This algorithm directly aggregates features based on regions of interest,on the one hand,it can model the relationship between targets,and on the other hand,it avoids the feature alignment operation based on optical flow.In order to fully mine the overall video information,the algorithm scrambles the input video in advance so that each frame of image can get several random auxiliary frames.Experiments on the public data set Image Net VID show that this algorithm has better accuracy performance than other existing algorithms.3.A fast video object detection algorithm based on key frame strategy is proposed.Though intensive feature calculation and aggregation significantly improves the accuracy of video object detection,it slows the running speed.This algorithm introduces the strategy of key frames,and proposes a new video object detection framework to achieve a balance of accuracy and speed.That is,for key frames,the calculation and aggregation of features are performed,and for non-key frames,the propagation of features is performed.Feature propagation from key frames to non-key frames is completed by an optical flow-based propagation module.The aggregation between key frames can be completed by an aggregation module based on an improved recurrent neural network,or based on a two-dimensional self-attention mechanism.Experiments on the public data set Image Net VID show that this algorithm can significantly improve the speed of video object detection,while also obtaining better accuracy than the benchmark method of frame-by-frame detection.
Keywords/Search Tags:video target detection, feature propagation and fusion, recurrent neural network, self-attention mechanism, optical flow
PDF Full Text Request
Related items