Font Size: a A A

Research On Deep Learning Object Detection Technology Based On Multi-Scale Feature Fusion

Posted on:2024-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2568307115978519Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Object detection is a technology related to computer vision and image processing.Its task is to process specific categories of semantic objects in digital images and videos.With the development of deep learning and the success of convolutional neural networks in the image field,new research ideas have been brought to the object detection technology,which has successfully promoted the industrial development in the fields of agriculture and intelligent driving.However,due to the variability,complexity and morphological inconsistency of object detection,it is difficult for the model to extract object information at different scales,resulting in inaccurate detection results and loss of detailed information reconstruction.In order to effectively utilize the detected feature information,the paper focuses on enhancing the spatial information and region of interest of features from feature fusion,nested pyramid and attention mechanism.In the 2D detection task,the multi-scale prediction network is selected as the baseline detector,and different fusion modules are added to realize the detection of pests,sheep and other targets.In the3 D detection stage,two different modal features of image and point cloud are fused to detect the car.The main research contents are as follows:(1)In response to the lack of interaction between low-level and highlevel feature maps in multi-scale prediction models,which leads to low detection accuracy and poor object localization,an improved multi-scale prediction model based on adaptive fusion is discussed.Firstly,the convolution reduction of the shallow feature layer and the deconvolution amplification of the deep feature layer are performed,and the features of the adjacent layers of the prediction layer are fused to enhance the information complementarity between different feature layers.Secondly,the feature layer of the same scale can provide different ranges of feature information,and transfer the specific features with detailed information to the abstract features with semantic information.At the same time,global average pooling is used to guide learning to further enhance the receptive field of input features,so that the network retains more feature information.The experimental results show that the improved model achieves an average accuracy of 80.6 % on the PASCAL VOC 2007 dataset at a speed of 60.9 frames per second on RTX 2080 Ti,which effectively improves the detection accuracy of various targets.(2)Aiming at the shortcomings of traditional pest small object detection,such as complex image processing,low recognition accuracy and weak generalization ability,an improved method of small target detection network based on nested pyramid fusion is discussed.In the first stage,to obtain rich pest appearance features and suppress useless information in the background,a multi-scale channel aggregation module is introduced to learn the weight relationship between channels.In the second stage,a shallow feature map is added to the backbone feature extraction network,and a feature pyramid is constructed using recursive methods to introduce top-down and bottom-up paths.The nested residual enhancement module is introduced to improve the global information perception ability of the network,and a feature map with precise location and strong semantic information is generated.In the third stage,according to the particularity of small object dataset,the default box parameters are optimized to match with the effective receptive field,and the ability of the network to identify the object is enhanced.Through comparative experiments on pest datasets and public datasets,the proposed method achieves 97.08% and 81.31% accuracy respectively,and effectively obtains the category and location of small targets in the image.(3)Aiming at the problem of depth information loss caused by inaccurate monocular estimation,a monocular 3D detection model based on multi-task learning and cross-view fusion is discussed.To begin,two parallel CNN networks are built by combining depth estimation and 2D detection methods,and multi-scale backbone networks are used to extract depth and image features.Then,the cross-view attention fusion module is used to aggregate the depth map and the local feature clues of the image to construct the correlation between different views.Finally,the confidence prediction module is introduced in 3D detection to learn the difference between the prediction box and the real box.Through the experimental verification of the car category on the KITTI testset,the improved model significantly improves the detection performance at different difficulty levels.
Keywords/Search Tags:object detection, adaptive fusion, nested pyramid, multi task learning, cross perspective attention fusion
PDF Full Text Request
Related items