Font Size: a A A

Research On Deep Neural Network For Object Detection From Multi-modal Images

Posted on:2024-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Y MengFull Text:PDF
GTID:2568307136489514Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Object detection has excellent play in the fields of intelligent surveillance,autonomous driving,and medical diagnosis.However,there are limitations in the information obtained using a single RGB camera,which can only capture the optical features of a scene in the visible wavelength band.With the development of technology and the increase of application requirements,different types of sensors have been adopted to capture multimodal data,and at the same time the data types and data volumes have been dramatically increased.How to effectively fuse the features of different modalities while ensuring the lightweight of the network has become a key problem in the research of multimodal object detection algorithms.In addition,the scale inconsistency problem in multimodal images also brings about feature differences,which leads to problems such as false detection and missed detection when the detection network is faced with simpler targets.Therefore,how to solve these problems has also become a hot area of research on multimodal object detection algorithms.In this study,we have explored the research of multimodal feature fusion object detection networks in the fields of autonomous driving and medical image detection to address the above problems,and the main research results are as follows:(1)In order to perform feature fusion more effectively in multimodal data,a multimodal feature fusion block is proposed in this thesis.The block includes learnable weighted fusion and coordinate attention mechanisms that can help to better utilize the complementary information of different modal images and further expand the perceptual field by introducing dilated convolutions in the existing spatial pyramid pooling structure.Such a design enhances the detection performance while keeping the network structure light.In addition,to detect small objects more effectively,we propose an improved CIOU loss function and demonstrate the effectiveness of this loss function in our experiments.We conducted experiments on two datasets,FLIR and LLVIP,and the results show that the MFF-YOLO network proposed in this thesis has advanced detection performance and better robustness under complex illumination and pedestrian scale variations.(2)In order to solve the problem of missed detection and false detection caused by inconsistent scale detection objects in images to be detected,this thesis further proposes an improved detection network MTri-YOLO based on the multi-scale feature sharing mechanism Tri-PAN based on the above MFF-YOLO.the Tri-PAN module fuses three adjacent layers of features to improve the detection performance.At the same time,the network further improves the spatial pyramid pooling structure to fuse object location information and high-level semantic information while expanding the perceptual field.This thesis also tests six different spatial pyramid pooling structure and three different upsampling methods to compare and analyze the performance differences of the respective algorithms and draw valid conclusions.Experimental results on two datasets,FLIR and LLVIP,show that the improved MTri-YOLO detection network has the best detection performance in scenes with complex illumination and pedestrian scale changes,proving the effectiveness and breakthrough of the MTri-YOLO detection algorithm based on multimodal feature fusion and multiscale feature sharing.(3)To bridge the gap between intracranial magnetic resonance image(MRI)datasets in the field of object detection and multimodal image fusion,and to combine the diagnostic difficulty between glioblastoma and brain metastases,this thesis compiles and publishes a public dataset of intracranial MRI multimodal images including metastases and gliomas(https://github.com/mzzjuve/BrainmriDataset),and used the MTri-YOLO network proposed in this thesis to effectively discriminate between metastases and gliomas.The experimental results show that the network has high detection effectiveness and fast detection speed,which can help to perform preliminary pathology detection and contribute to the field of medical multimodal object detection.Meanwhile,to provide a variety of options under different hardware conditions,this thesis provides a comparison experiment of four different sizes of MTri-YOLO algorithm structures,testing the respective number of parameters,detection metrics and detection speed.
Keywords/Search Tags:Object detection, multimodal, deep learning, autonomous driving, feature fusion
PDF Full Text Request
Related items