| Multi-object tracking is a critical task in the field of computer vision and its broad applications in domains such as autonomous driving,military and aerospace have garnered significant attention.In recent years,researchers have proposed numerous innovative approaches to solve this task,achieving certain progress.Among them,one-shot online multiobject tracking methods have become one of the mainstream methods due to their outstanding performance in real-time,accuracy and robustness.However,the high integration of inference networks and the difficulty of partial optimization in this method limit its ability to address the internal feature demand conflicts,resulting in performance degradation and preventing it from being comparable to traditional two-stage approaches.To overcome this challenge,we peruse the solution for feature demand conflicts in one-shot online multi-object tracking methods and has achieved a series of excellent outcomes.The main contributions of this paper are as follows:(1)In the inference network of one-shot approaches,conflicts arise among the semantic feature demands of different branches.To alleviate these conflicts,we propose a semantic disentanglement method,which separates and reorganizes different information semantics into distributions that are suitable for the demands of each branch.This framework comprises a feature recoupling module that reorganizes the semantics of extracted features and a feature differentiating module that supplies dedicated feature information tailored to the demands of each sub-branch.Furthermore,considering the eager demands for shallow-level features and the high-dimension of the output embedding representation of the Re-ID branch,we design a re-globalization module to re-enrich the shallow-level semantic features that are lost due to deep network processing.Experimental results show that this method outperforms existing approaches and can effectively resolve the complex entangled states in feature semantics,significantly improving system performance.(2)The conflicts in feature semantic demands primarily arise from the shared network among branches.To address this issue,we propose a novel approach that enhances the independence of different branches,leading to a reduction in conflicts.Our method treats the detection process as a generative process with conditioning control.By leveraging the detection results from the previous frame,the demand for original information in the detection branch is minimized,thereby enabling the feature extraction network to optimize more for the Re-ID branch.To our knowledge,this study is the first to introduce the denoising diffusion implicit model into the multi-object tracking domain.Our method comprises information compression,latent space diffusion,denoising,and mapping back to the embedding representation.Furthermore,we have developed an efficient denoising module to meet real-time requirements by performing denoising faster.The experimental results demonstrate that the multi-object tracking method based on the denoising diffusion implicit model achieves performance comparable to traditional methods. |