Font Size: a A A

Video Object Detection Based On Attention Mechanism And Multi-Scale Feature Fusion Convolutional Network

Posted on:2021-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:S JiangFull Text:PDF
GTID:2518306050968859Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent decades,as one of the basic tasks in the field of computer vision,the object detection algorithm has been extensively studied by scholars,and a series of excellent research results have been produced.Image object detection tasks have made tremendous progress in the past few years,and their detection performance have been significantly improved.However,in the fields of video surveillance and vehicle assisted driving,videobased object detection has a wider demand.Due to the problems of motion blur,occlusion,diversity of morphological changes,and diversity of illumination changes in the video,only using image object detection technology to detect the object in the video can not get a good detection result.How to use information such as object timing information and context in video has become the key to improving the performance of video object detection.Existing video object detection technology usually has the problems of slow operation,high miss detection rate and low accuracy rate.This thesis improves and applies separable convolution,residual and pyramid structure,attention mechanism,evolutionary algorithm,and model pruning algorithm to the video object detection task to improve the detection accuracy and running speed of the video object detection model,thereby promoting the development of video object detection technology.The research contents of this article are as follows:1.In order to solve the problems of small object detection difficulty and high model calculation complexity,this thesis proposes a video object detection algorithm based on multi-scale fusion and residual pyramid network,which is called MRDS-FPN.This method first introduces separable convolution to reduce the amount of parameters and computational complexity,utilizes stackable separable convolution residual structure to extract basic features,design the structure of feature pyramid to fuse the multi-scale features of the network,enhance the ability to express features,use features The extraction sub-network is combined with the optical flow network to extract image information and its motion information,and is combined with the R-FCN detection sub-network to realize the video object detection task.This method effectively solves two major problems in video object detection technology through the design of network structure.The data-enhanced Image Net VID dataset was used for experiments and compared with other latest video object detection algorithms.Experimental results prove the effectiveness of this method to solve the problem of video object detection from the perspective of network structure design.At present,the problem of this algorithm is that it needs to further improve the design of the network structure,and further improve the detection accuracy while optimizing the detection speed.2.In view of the existence of a large number of motion blur,occlusion,and diversity of morphological changes in the video object detection data set,a video object detection method based on attention mechanism and weighted residual pyramid network is proposed,called AWR-FPN.While reducing the calculation amount of the model,it improves the network's ability to express features.This method introduces a selective attention mechanism in the feature extraction stage based on the multi-scale fusion residual pyramid network MRDS-FPN.Thanks to the residual structure and the deeper network layers,the residual network can better focus on the feature map area where the object category is located.Modeling and the effect of channel attention enable the classification network to better distinguish irrelevant features and suppress other information that affects the classification results.By assigning different weights to different channels in the feature map,channel selection is performed to suppress background information and enhance foreground information to achieve the purpose of refining features,thereby improving the overall detection accuracy.In the detection algorithm,the use of a more accurate feature extraction network for classification features will help subsequent classification and regression prediction.Finally,the data-enhanced Image Net VID data set is used for experiments and compared with other latest video object detection algorithms.Experimental results prove the effectiveness of this method to solve the problem of video object detection from the perspective of network structure design.At present,the algorithm needs to simplify the network structure to improve the detection speed of the model and improve the real-time performance.3.Aiming at the problem of slow operation of video object detection algorithm,a video object detection method based on attention mechanism and evolutionary pruning convolutional network is proposed.This method proposes a new pruning scheme based on evolutionary algorithm,and applies it to the AWR-FPN network constructed in Chapter 3.The evolutionary pruning algorithm removes redundant filters in the network,thereby realizing convolutional nerves Network acceleration.By using a pruning algorithm on the attention-weighted residual pyramid network after training to remove a large number of redundant convolution filters in the network,the risk of overfitting the network is reduced,the network structure is greatly simplified,and the number of parameters is reduced It is easier to deploy in embedded devices and at the same time makes the reasoning speed significantly faster.The evolutionary algorithm is used to optimize the pruning scheme,and the filters to be pruned are jointly encoded,which has strong flexibility,and makes full use of the correlation between the filters to accelerate the network while improving network performance.Finally,on the Image Net VID data set after the data enhancement,a detailed comparison is made with the comparison method to verify the effectiveness of the method,and the national invention patent applied for by this method.The problem to be solved by this algorithm is how to optimize the training of the network.
Keywords/Search Tags:Video Object Detection, Separable Convolution, Feature Pyramid, Attention Mechanism, Model Pruning
PDF Full Text Request
Related items