Research On Video Object Detection Based On Feature Propagation And Fusion

Posted on:2021-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2428330614468313

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Object detection recognizes and locates objects,playing an important role in basic modeling of scenes in visual application systems such as video surveillance,intelligent robots,and autonomous driving.Due to the introduction of deep learning,the performance of object detection algorithms based on static images has greatly improved.However,in actual application scenarios,the input of the visual system is usually a continuous video,rather than an independent image.Applying the static image object detection algorithm directly to each frame of the video will cause the following problems:(1)The video image quality is often significantly reduced due to motion blur,scale changes,object occlusion,etc.,and a single image cannot provide accurate information of the target;(2)The vision system requires real-time performance,if feature calculation is performed for each frame,the algorithm cannot run at the required speed;(3)The video image has spatiotemporal consistency,and independent detection for each frame does not make full use of timing information.This thesis studies video object detection algorithms based on feature propagation and fusion.Among them,the feature fusion module improves the detection accuracy by modeling the spatiotemporal relationship of the target;and the feature propagation module improves the detection speed by utilizing the redundancy of adjacent images.The main work content and innovative achievements of this thesis include the following three aspects:1.A video object detection algorithm based on recurrent neural network is proposed.The recurrent neural network has a memory retention function,which can extract video timing information.This algorithm uses the improved recurrent neural unit for the transfer and aggregation of features between image frames to improve the feature expression of the current frame.In order to mitigate the impact of the motion offset on the aggregation module,optical flow is used to align features between image frames.Optical flow calculation is integrated into the network,thus the algorithm framework maintains an end-to-end structure.This algorithm meets the actual system requirements of causality.Experiments on the public data set Image Net VID show that this algorithm effectively improves the accuracy of video object detection,and can achieve better or similar performance even when compared with non-causal algorithms.2.A video object detection algorithm based on self-attention mechanism is proposed.Aiming at the short-range dependence of the recurrent neural network,this algorithm uses the self-attention mechanism instead to construct a feature aggregation module.The self-attention mechanism has the ability to directly capture short-range and long-range information,and can increase the parallelism of the calculation.This algorithm directly aggregates features based on regions of interest,on the one hand,it can model the relationship between targets,and on the other hand,it avoids the feature alignment operation based on optical flow.In order to fully mine the overall video information,the algorithm scrambles the input video in advance so that each frame of image can get several random auxiliary frames.Experiments on the public data set Image Net VID show that this algorithm has better accuracy performance than other existing algorithms.3.A fast video object detection algorithm based on key frame strategy is proposed.Though intensive feature calculation and aggregation significantly improves the accuracy of video object detection,it slows the running speed.This algorithm introduces the strategy of key frames,and proposes a new video object detection framework to achieve a balance of accuracy and speed.That is,for key frames,the calculation and aggregation of features are performed,and for non-key frames,the propagation of features is performed.Feature propagation from key frames to non-key frames is completed by an optical flow-based propagation module.The aggregation between key frames can be completed by an aggregation module based on an improved recurrent neural network,or based on a two-dimensional self-attention mechanism.Experiments on the public data set Image Net VID show that this algorithm can significantly improve the speed of video object detection,while also obtaining better accuracy than the benchmark method of frame-by-frame detection.

Keywords/Search Tags:

video target detection, feature propagation and fusion, recurrent neural network, self-attention mechanism, optical flow

PDF Full Text Request

Related items

1	Research On Sensor Activity Recognition Based On Improved Deep Recurrent Neural Network
2	Abnormal Behavior Detection In Complex Scene
3	Wide-band Radar High Range Resolution Profile Target Recognition Based On Deep Neural Network
4	Research On The Violent Detection Of Audio And Video Based On Attention Mechanism
5	Research On Video Person Re-identification Based On Deep Learning
6	A Research And Application Of Faster RCNN Target Detection Algorithm Based On Attention Mechanism
7	Video Object Detection Based On Attention Mechanism And Multi-Scale Feature Fusion Convolutional Network
8	Research And Implementation Of Video Action Recognition Based On Long-Time Feature Fusion And Attention Mechanism
9	Optical-Flow-Guided Multi-Keyframes Feature Propagation And Aggregation For Video Obiect Detection
10	Research On Object Detection Based On Multi-scale Feature Fusion