Font Size: a A A

Research On Video Object Detection Based On Multi-Attention Mechanism

Posted on:2022-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:X P ZhuangFull Text:PDF
GTID:2518306335966519Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Object detection aims at classifying specific objects in the scene and simultaneously localizing their bounding boxes.In recent years,with the popularization of deep convolutional networks,the accuracy and efficiency of object detection have been improved.Object detection has been widely used in the fields of surveillance and medical intelligence.Among them,video object detection has more general needs than image.Therefore,research on video object detection has both academic value and application requirements.Video object detection still faces several difficulties and challenges.For example,the imbalance of positive and negative samples leads to low information processing efficiency;large computing resources of the model are not conducive to embedded devices;and the detection performance declines in abnormal situations in video.This thesis research on video object detection based on multi-attention mechanism to deal with the challenges.By introducing multi-scale attention,spatial attention and local attention sequence module,combined with pruning and distillation,we achieve high-precision,high-efficiency,high-reliable video object detection.In response to the low information processing efficiency caused by the imbalance of positive and negative samples,we propose multi-scale attention module and spatial attention module.These modules enable the model to adaptively select more important information in the feature and spatial view,alleviating the imbalance problem and improving the accuracy of object detection.The experimental analysis shows that multi-scale attention module and spatial attention module could improve the accuracy relatively by 4.4%.And through visualizing the attention distribution,it shows that the attention module could select more important information,thereby improving the overall performance.In response to large computing resources of the model,which are not conducive to embedded devices,we propose a channel pruning and distillation compression method for object detection.It could adaptively select high-importance channels to prune redundant parameters,and improve the accuracy through distillation.It effectively reduces the amount of parameters and calculations while maintaining considerable accuracy,and finally optimizes the inference for deployment.The method could reduce the amount of parameters and calculations by 86.2%and 89.7%while the accuracy loss is only 6.3%on VOC dataset.And it is also deployed on TX2 and the inference speed is increased by about 2 times.In response to abnormal conditions such as occlusion,blurring,and out of focus in videos,we propose a local attention sequence model,and optimize the parameter and calculation of ConvGRU.It could process spatial and temporal information in videos more efficiently and finally improve the detection performance under abnormal conditions.The experiment shows that the modified ConvGRU and the local attention sequence model could improve the detection accuracy by 5.3%.And the visualization results show that the method is adaptive to different abnormal conditions,thereby improving the reliability of video object detection.
Keywords/Search Tags:video object detection, deep convolutional network, attention mechanism, channel pruning, knowledge distillation, sequence model
PDF Full Text Request
Related items