Research On Open-Set Object Detection Method Based On Multi-Modal Learning

Posted on:2024-11-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Y Ma

Full Text:PDF

GTID:1528307373971029

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As a critical research direction in the field of computer vision,object detection has a wide range of applications in intelligent transportation,video surveillance,autonomous driving,and other areas.Existing methods typically focus only on the object detection problem in a single set scene,such as cross-domain detection or new class detection,leading to algorithm models being unable to address open-world object detection problems.With the rise of multi-modal learning,artificial intelligence models are beginning to incorporate multi-modal data to solve challenges in open-world settings,thereby greatly enhancing the models’ understanding and cognitive capabilities of the real world.This dissertation proposes multi-modal learning in object detection,utilizing rich category information inherent in image and text data for background modeling or dictionary learning,aiding downstream object detection models in improving detection performance.Specifically,background modeling can effectively alleviate background shift issues in crossdomain detection,while dictionary learning can define unlabeled categories in new class detection.This dissertation conducts corresponding experiments in two scenarios: multiobject recognition in road scenes and monitoring of marine biodiversity,confirming the role of multi-modal learning in alleviating the challenges of cross-domain and new class detection.In particular,the main research contributions of this dissertation are as follows:(1)This dissertation proposes a domain-generalization object detection method based on multi-modal representation alignment,aiming to enhance the model’s capability to detect targets in completely unknown new domains.It presents an object detection method based on text category embeddings,leveraging the uniqueness of text category embeddings to construct a visual feature classifier guided by text semantics.By introducing a multi-modal representation alignment pre-training model,it assists in fully modeling the background for downstream tasks.Additionally,by incorporating representation consistency learning and domain adversarial learning modules,the method improves the model’s learning ability for the source domain,resulting in increased accuracy of target detection in completely unknown domains.(2)This dissertation proposes a foundational model-based open-environment domain adaptation method,aiming to enhance the model’s detection capability in unlabeled new domains.By introducing a vision-language pre-training foundational model to increase category granularity,it improves the model’s ability to adapt to complex data distributions.Through a hierarchical feature alignment strategy,it maps the features of the source domain and target domain to the same semantic space.In multi-source domain and multitarget domain settings,it addresses corresponding challenges through cross-reconstruction and freezing of certain parameters,effectively alleviating knowledge forgetting while also improving the model’s cross-domain target detection accuracy.(3)The dissertation proposes an open-world object detection method that is more aligned with practical applications,while considering cross-domain and new-class detection scenarios.Existing single-stage frameworks are unable to address this challenging task.The proposed method constructs a two-stage training framework,pre-training to build an instance dictionary and establish connections between annotated and unannotated classes to enable the detection of new-class targets in downstream tasks.During the formal training process,domain adversarial training is proposed to further narrow the domain gap,reduce missed detections under scene changes,and enhance the model’s accuracy in detecting new-class targets in cross-domain scenarios.(4)This dissertation presents a novel approach to new-class object detection based on visual-text joint pre-training,aiming to detect unannotated new categories in oceanic datasets.Additionally,it proposes the Marine Det dataset for oceanic scene object detection,comprising over 20,000 images,26 major categories,and 821 subcategories.The dissertation innovatively transfers knowledge learned from land scenes to fine-tune the oceanic data.Detailed comparisons with existing open-vocabulary object detection algorithms and fully supervised algorithms validate the potential of cross-domain data in oceanic tasks,further enhancing the model’s accuracy in detecting new-class targets in oceanic scenes.Finally,this dissertation briefly summarizes the above research content,provides prospects for the future of object detection,and identifies potential directions for further in-depth research,offering new perspectives for future studies.

Keywords/Search Tags:

Object detection, cross domain detection, autonomous driving, marine object detection

PDF Full Text Request

Related items

1	Research On Deep Models For Object Detection In Hazy Weather
2	The Research And Development Of The Object Detection And Subdivision System Based On TensorFlow Framework
3	Semantic Image Segmentation And Object Detection In Autonomous-Driving System
4	Deep Learning-based 3D Object Detection In Point Cloud
5	Research On 3D Object Detection Algorithms Based On Monocular Vision
6	Deep Learning 3D Object Detection
7	Real-time Object Detection Based On Cascaded Neural Network
8	Research On 3D Object Detection Based On RGB And LIDAR Data
9	Research On Point-voxel-based LiDAR 3D Object Detection
10	Research On Fast Point Cloud Object Detection Based On Lightweight Neural Network