Font Size: a A A

Multi-speaker Tracking Based On Audio-video Information Fusion In Smart Environment

Posted on:2012-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:J R ZhengFull Text:PDF
GTID:2178330335466800Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Human brain helps people track and identify things accurately in a complex environment, by integrating all the senses from multi-source sensory organs. In the smart environment, speaker tracking is a major research area of human-computer interaction. Now, how to make full use of multi-modal sensor information including the same speaker's voice and video image data to achieve robust and accurate tracking performance, through taking example by the brain's integration mechanism, is drawing more concern of researchers in heterogeneous information fusion.After summarizing and introducing the basic theory and research status of multi-source information fusion, video tracking, sound source location and filtering algorithm, two kinds of novel human tracking algorithm based on multi-source information fusion are proposed. One is multi-person tracking based on multiple video feature information fusion, and the other is speaker tracking based on the audio-video information fusion.Skin color is used because of its anti-rotation and anti-block properties in multiple video feature fusion based person tracking system, and the color likelihood model is constructed by color histogram. Moreover, the edge gradient search strategy is utilized to get contour likelihood model, using the characteristics for contour to represent the shape of the target. Finally, both color and contour information are integrated in a particle filter framework to keep tracking multiple persons.In an audio-video fusion based speaker tracking system, combined with the complementarity of voice and video images from a homology speaker, microphone time delay based sound source localization information and mean-shift based color information are used separately to establish audio model and video model. Then the IPF is utilized as a tool to create fusion likelihood model as well as the fusion importance function from which particles are sampled. A closed-loop processing framework, in which the feedback process is introduced, is adopted to improve the tracking accuracy and completeness.Experiments using real world data show that the proposed two information fusion based tracking algorithm is feasible. Multiple video feature information fusion based multi-person tracking algorithm is robust at light change and background clutter interference. While audio-video fusion based speaker tracking approach can accurately track the conference's main spokesman, and have good tracking performance even there exists speaker movement, posture changes and other complex cases.
Keywords/Search Tags:audio-video, heterogeneous sensor fusion, target tracking, mean shift, Sound Source Localization, skin color histogram, importance particle filter
PDF Full Text Request
Related items