| Speaker tracking is one of important research significance and wide practical value in video conference,multimedia system,intelligent surveillance,human-computer interaction,robotics and so on.Sound source localization technology often plays a fundament role in speaker tracking,but it is susceptible to reverberation and noise,especially for multi-speaker circumstances,the tracking results become unreliable.As we known,Kinect linear microphone sensor array which is composed of four microphones can effectively suppress noise and eliminate echo,so in this dissertation Kinect sensor was used for receiving and processing of speech signal of the speaker,and based on adaptive beamforming algorithm to track speakers.In multi-speaker circumstance,speaker’s voiceprint features were extracted to confirm verification of the specific speaker,and then real-time tracking is realized.Firstly,for multi-speaker circumstance,speaker verification method based on GMM-UBM model is used to authenticate the speaker.This method extracted MFCC of which represents human auditory characteristics for voiceprint features,and used GMM-UBM model as training model of speaker verification.In testing phase,matching score has been computed by test speech with the trained model,and it was compared with preset threshold to output final judgment result.Secondly,by using adaptive beam forming algorithm,solved issue of speaker sound source localization in the case of fewer elements in microphones array which were mounted on Kinect.Finally,designed a specific speaker tracking system that mainly includes three functional modules including audio acquisition and processing module,speaker verification module and positioning and tracking module.In the end,supported with toolkits such as Kinect Windows SDK v1.8,Open CV,and tsVPR,the modules proposed above have been implemented.The system test results show that the adaptive beamforming technique can be used to locate the speaker’s position accurately in the speaker tracking system.In ideal environment,the average positioning accuracy of singular specific speakers reached to 93.3%and the RMSE lower than 6.4,the accuracy of multi-speakers reached to 89.5%.In addition,under the condition that the ambient noise is 30-50dB and reverberation time is 30ms and 50ms,after turned on noise suppression and echo cancellation function of Kinect sensor,singular specific speaker positioning accuracy has reached to 83.35%and 8.9 of RMSE,the accuracy for multi-speakers has reached to 81.27%.These testing results meet the system performance requests completely,and prove that the system was robust in noise and reverberation indoor environment. |