Font Size: a A A

Target Detection Based On Multi-modal Images

Posted on:2022-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z ShiFull Text:PDF
GTID:2518306527983089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advancement of science and technology and the rapid development of computer vision technology,the target detection technology based on deep learning has been more and more widely used,showing vigorous vitality in various fields.At the same time,more and more diversified data forms have emerged,which brings difficulties to target detection tasks.Visible light single-modal data can not provide sufficient information in some complex cases,which makes target detection tasks based on visible light single-modal images have poor performance.However,the increasingly diversified data formats have brought a turning point for this problem.It is possible to improve the performance of target detection tasks by fusing image data of different modalities.However,the fusion method of different modal images is still to be studied.How to make the multi-modal target detection model better combine the features from different modal images to achieve better target detection effect has become a hot topic of current research.It has high value in both theoretical research and practical application.However,while the use of multi-modal data has brought about an increase in accuracy,it will also reduce the real-time performance due to the increase in the complexity of the network model and occupy a lot of computing resources.How to reduce the size of the model and improve the real-time performance of the algorithm without reducing the accuracy is a direction worth studying.This dissertation mainly focuses on the target detection algorithm based on multi-modal images and achieves the following results.Firstly,the pedestrian detection algorithm based on visible light single-modal image performs poorly in the case of insufficient light at night.We construct a pedestrian detection algorithm model which using visible light-infrared light bimodal image as input based on the YOLO algorithm.Experiments were carried out according to the different fusion stages of data-level fusion,featurelevel fusion,and decision-level fusion in the process of multi-modal feature fusion.The above several different fusion stages ignore the multi-scale problem of fusion feature maps.Based on the feature maps of different scales generated by the feature extraction network Darknet of the YOLO algorithm during the feature extraction process,multi-modal image feature fusion at different scales is performed,and multi-scale pedestrian detection is performed.We proposed a multi-modal fusion pedestrian detection algorithm based on YOLO.By Comparing with the singlemodal pedestrian detection algorithm based on visible light on the public standard multi-modal pedestrian detection data set.We Prove the effectiveness of using multi-modal images for target detection and Preliminarily determined the architecture of the multi-modal fusion pedestrian detection algorithm based on YOLOThen,other pedestrian detection algorithms based on multi-modal image fusion use the direct concat cascade fusion method in the feature fusion ignore the difference of different modalities.The multi-modal weighted fusion module is carefully designed to give different modal image weights,and the CBAM attention mechanism is introduced.Through ablation experiments on the used modal weighted fusion layer and attention mechanism on the public standard multi-modal pedestrian detection data set,the effectiveness of the proposed multi-modal weighted fusion module combined with the attention mechanism is proved.It further improves the accuracy of the proposed multimodal weighted fusion pedestrian detection algorithm based on YOLO.And compared with other multi-modal pedestrian detection algorithms,it proves the effectiveness of the algorithm proposed in this paper.Finally,the YOLO-based multi-modal weighted fusion pedestrian detection algorithm network model becomes larger in size and occupies a lot of computing resources due to the use of multimodal data,which leads to the lack of real-time performance of the algorithm caused by the decrease in detection speed.We made an attempt to lighten the model,and the model light-weight method that introduced two different ideas of deep separable convolution and channel shuffling was experimented,and the advantages and disadvantages of the two were compared.The proposed lightweight multi-modal weighted fusion pedestrian detection algorithm can reduce the size and computational complexity of the model without significantly reducing the detection accuracy,and effectively improve the real-time performance of the algorithm.To sum up,in order to prove the effectiveness of the target detection algorithm based on multimodal images compared to the target detection algorithm based on visible light single-modal images,this paper has experimented with a variety of feature fusion methods at different stages,according to the multi-scale characteristics of the extracted feature maps of Darknet,the feature extraction network of the YOLO algorithm,multi-modal feature fusion at different scales is carried out.Then the attention mechanism was introduced,and a modal weighted fusion module was designed to perform modal weighted fusion of features from different modalities.A multi-modal weighted fusion pedestrian detection algorithm based on YOLO is proposed,and the effectiveness of this algorithm is verified through experiments.In addition,a lightweight attempt was made on the algorithm to improve the real-time performance of pedestrian detection without significantly reducing the accuracy,and certain results have been achieved.
Keywords/Search Tags:Multi-modality, Pedestrian Detection, Feature Fusion, Attention Mechanism
PDF Full Text Request
Related items