Font Size: a A A

Research On The Semantics Driven Feature Fusion Strategies And Their Applications

Posted on:2023-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhaiFull Text:PDF
GTID:1522307025465014Subject:Control Science and Engineering
Abstract/Summary:
Feature learning is crucial to ensuring the effectiveness of computer vision perception algorithms,and feature fusion strategy is an essential technique to improve feature representation capabilities.Existing visual perception methods still have insufficient modeling of scene semantic information and the fusion strategy remains naive,which cannot meet the perception requirements of complex scenes for intelligent systems.Through the development of visual semantic models of complex scenes,this dissertation explores how to use transformer-based and graph-based models to enhance knowledge transfer and cognitive reasoning,as well as to enhance visual perception accuracy.The main research contents and innovations include:Firstly,in view of the shortcomings of existing methods that ignore the semantic consistency between features when understanding the global context information of the scene,which leads to the sub-optimal performance of cross-view retrieval,this dissertation uses a deep clustering network to establish a global commonality semantic model of cross-modal views,realizes the interactive fusion and enhancement of the commonality semantic model and the individualized features of each view under the Transformer-based framework,and extracts the model-invariant representations to complete the cross-view retrieval.Experiments on two challenging datasets show that the algorithm achieves the most accurate cross-view retrieval accuracy over the same period.Secondly,in view of existing deterministic modeling methods that ignore the inherent uncertainty caused by texture similarity during the inference process,this dissertation develops a Bayesian neural network-based uncertainty quantitative network,designs a prototyping semantic model to capture the local semantics of the object,and presents an enhanced mapping between local and global semantics under a transformer-based framework to realize camouflaged object detection.Moreover,this dissertation demonstrates that the edges are the areas with higher uncertainty during the inference process,and develops an aggregated semantic model to reframe the grid features into a graph-based space.By exploiting the complementarity between object edges and regions,the proposed model accounts for feature interaction of two subtasks in the graph space and can effectively distinguish the foreground and background semantics of an object.On three public datasets,the proposed algorithms outperform the most accurate detection accuracy of camouflaged objects during the same period.Lastly,in view of the performance loss of crowd density estimation in multi-view scenes caused by feature mismatch between views and geometric affine relationship estimation errors in existing methods,this dissertation proposes a collaborative communication graph convolution method.The proposed method completes intra-view reasoning and inter-view communication in the aggregated semantic graph-based feature space,fully utilizes the mutual guidance information between multi-camera views,and realizes feature fusion without prior information of scene structure.This dissertation verifies the effectiveness of the algorithm on three public scene sets and achieves the best crowd density estimation accuracy in the same period.The research in this dissertation will further improve the perception ability of modern intelligent systems.It has important theoretical significance and practical value for promoting the iterative upgrading of intelligent systems.The research results of this dissertation can be applied to autonomous driving,smart medical treatment,smart agriculture,robotics,and other fields.
Keywords/Search Tags:Deep visual reasoning, Semantic modeling, Feature fusion and interaction, Object perception, Scene understanding
Related items