| Target tracking algorithm based visual is easily disturbed by occlusion,large change of target state and other factors.The acoustic source localization algorithm based on audio information can measure the position information of sound source,but it is easy to be disturbed by environmental noise and indoor reverberation.If the two results are fused,a more robust tracking effect can be obtained.In order to improve the ability of hearing-impaired people to perceive the environment,an object detection and tracking algorithm based on audio-visual information fusion is proposed.The system can obtain the video and audio information of the target,detect and track the target in the visual field,locate the sound source in the audio field,and finally fuse the two tracking information in the decision-making level to obtain the target tracking results based on audio and video.The research contents of this paper are as follows:(1)An object detection-tracking system based on audio-visual information fusion is designed.The system can collect visual information and audio information through hardware devices,and then transmit the signal to the host computer.The host computer runs the object detection-tracking algorithm of audio-visual information fusion to realize the object tracking and positioning.(2)The hardware part of the system is designed and made.The hardware part includes visual acquisition module,audio acquisition module,data transfer module and upper computer.(3)The algorithm framework of the system is designed.The algorithm includes object detection-tracking module based on visual,acoustic source localization module based on audio and object tracking module based on audio-visual fusion.The object detection-tracking module based on visual is implemented by YOLOv5 m and unscented Kalman filter algorithm.In the sound source location module,a four element cross microphone array is designed to obtain the audio,and the calculation formula of acoustic source azimuth is derived.The object tracking module based on audio-visual fusion uses importance particle filter as the information fusion tool,constructs audio-visual likelihood function and audio-video importance sampling function,and fuses the visual tracking and acoustic source localization results at the decision level to obtain the fusion tracking results.(4)The function of the system is tested in indoor complex environment.The results show that the tracking accuracy of the algorithm is more than 90%,and it has a higher accuracy compared with the single mode algorithm. |