Research On Speaker Tracking Algorithm Based On Fusion Of Audio And Video Information

Posted on:2023-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Xiong

Full Text:PDF

GTID:2568307031492484

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Speaker tracking algorithms based on the fusion of audio and video information have attracted increasing attention due to the potential applications in video conferencing,individual speaker discrimination,surveillance,and monitoring,to name a few.In practical applications,audio and video information fusion tracking is faced with many challenges,including the fusion of multi-modal information,estimation of the variable number of speakers and their states,and dealing with tracking errors under various conditions such as occlusion,limited camera view,lighting environment changes,and room reverberation.Integrating audio and video information to locate and track speakers is a hot topic.The purpose of this thesis is to solve some of the above challenges under the Bayesian framework.This thesis studies two speaker tracking algorithms using audiovideo multimodal information fusion in terms of co-located and spatially distributed datasets.For tracking speakers using spatially distributed datasets,a particle filtering algorithm based on random finite set(RFS)framework is proposed.In the RFS approach,the computational cost becomes expensive as the number of speakers increases.To solve this problem,a probability hypothesis density(PHD)filter is adopted and combined with a sequential monte carlo(SMC)implementation.In the proposed audio-visual mean-shift sequential monte carlo probability hypothesis density(AVMS-SMC-PHD)tracking algorithm,audio data is used to determine when to propagate and re-allocate these particles based on their types,and the mean-shift(MS)method is applied to the tracking system,which makes the estimated position closer to the real position of the speaker,and improves the estimation accuracy and computational efficiency of the algorithm.For tracking speakers using co-localized datasets,a novel audio-video information fusion(AV)tracking algorithm is proposed for multi-speaker tracking.The audio location information is used to combine face detection three-dimensional(3D)mouth information to improve the video likelihood function and 3D mouth-height information is used to assist audio observation to improve the audio likelihood.Moreover,this thesis sets an adjustable weight to better integrate audio positioning information and 3D mouth information in different scenes.Compared with the previous methods,the proposed algorithm subtracts the calculation and comparison of the color model,and directly integrates the 3D audio positioning information and video positioning information,which not only performs well for different scenarios,but the tracking efficiency is also greatly improved.

Keywords/Search Tags:

Speaker tracking, audio and video fusion, particle filtering, likelihood function, probability hypothesis density

PDF Full Text Request

Related items

1	Research On Group Target Tracking Based On Probability Hypothesis Density Filter
2	PHD Smooth Multi-target Tracking Algorithm Based On Box Particle Label
3	Research On Particle Filter And Probability Hypothesis Density Filter For Target Tracking
4	Research On Box-particle Filter Based Multi-target Visual Tracking
5	Study On Multi-target Tracking Problems Based On Probability Hypothesis Density (PHD)
6	Study On Multi-target Tracking Problems Based On Probability Hypothesis Density (phd)
7	Research On Multi-Target Tracking Combined With The Probability Hypothesis Density Filter
8	Multi-speaker Tracking Based On Audio-video Information Fusion In Smart Environment
9	The Algorithm And Hardware Implementation Of The Probability Hypothesis Density Particle Filter
10	Research On Image Multi-target Tracking And Tracks Keeping Based On Probability Hypothesis Density Filtering