Research On The Semantics Driven Feature Fusion Strategies And Their Applications

Posted on:2023-02-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q Zhai

Full Text:PDF

GTID:1522307025465014

Subject:Control Science and Engineering

Abstract/Summary:

Feature learning is crucial to ensuring the effectiveness of computer vision perception algorithms,and feature fusion strategy is an essential technique to improve feature representation capabilities.Existing visual perception methods still have insufficient modeling of scene semantic information and the fusion strategy remains naive,which cannot meet the perception requirements of complex scenes for intelligent systems.Through the development of visual semantic models of complex scenes,this dissertation explores how to use transformer-based and graph-based models to enhance knowledge transfer and cognitive reasoning,as well as to enhance visual perception accuracy.The main research contents and innovations include:Firstly,in view of the shortcomings of existing methods that ignore the semantic consistency between features when understanding the global context information of the scene,which leads to the sub-optimal performance of cross-view retrieval,this dissertation uses a deep clustering network to establish a global commonality semantic model of cross-modal views,realizes the interactive fusion and enhancement of the commonality semantic model and the individualized features of each view under the Transformer-based framework,and extracts the model-invariant representations to complete the cross-view retrieval.Experiments on two challenging datasets show that the algorithm achieves the most accurate cross-view retrieval accuracy over the same period.Secondly,in view of existing deterministic modeling methods that ignore the inherent uncertainty caused by texture similarity during the inference process,this dissertation develops a Bayesian neural network-based uncertainty quantitative network,designs a prototyping semantic model to capture the local semantics of the object,and presents an enhanced mapping between local and global semantics under a transformer-based framework to realize camouflaged object detection.Moreover,this dissertation demonstrates that the edges are the areas with higher uncertainty during the inference process,and develops an aggregated semantic model to reframe the grid features into a graph-based space.By exploiting the complementarity between object edges and regions,the proposed model accounts for feature interaction of two subtasks in the graph space and can effectively distinguish the foreground and background semantics of an object.On three public datasets,the proposed algorithms outperform the most accurate detection accuracy of camouflaged objects during the same period.Lastly,in view of the performance loss of crowd density estimation in multi-view scenes caused by feature mismatch between views and geometric affine relationship estimation errors in existing methods,this dissertation proposes a collaborative communication graph convolution method.The proposed method completes intra-view reasoning and inter-view communication in the aggregated semantic graph-based feature space,fully utilizes the mutual guidance information between multi-camera views,and realizes feature fusion without prior information of scene structure.This dissertation verifies the effectiveness of the algorithm on three public scene sets and achieves the best crowd density estimation accuracy in the same period.The research in this dissertation will further improve the perception ability of modern intelligent systems.It has important theoretical significance and practical value for promoting the iterative upgrading of intelligent systems.The research results of this dissertation can be applied to autonomous driving,smart medical treatment,smart agriculture,robotics,and other fields.

Keywords/Search Tags:

Deep visual reasoning, Semantic modeling, Feature fusion and interaction, Object perception, Scene understanding

Related items

1	Dynamic 3D Perception,Understanding And Visualization Of UAV Under DVE
2	Multi-type Object Detection For Large-field Remote Sensing Images Based On Deep Visual Perception Modeling
3	Research On Campus Environment Scene Understanding Based On Laser Point Cloud And Visual Image Fusion
4	Research On Cognition Of Complex Environment And Scene Understanding Based On Multi-sensor Information Fusion
5	Research On Autonomous Driving Road Scene Understanding Algorithm Based On Deep Learning
6	Scene Understanding And Perception In 3D Environment
7	Research On Visual Recognition Technology Of Sea Surface Targets Based On Deep Learning
8	Infrared-visible Light Nighttime Vehicle Scene Parsing Based On Adversarial Guided Fusion And Attention Segmentatio
9	The Research On Road Scene Semantic Segmentation Method Based On Deep Learning
10	Research On Traffic Scene Image Recognition And Semantic Understanding Method Based On Deep Learning