Font Size: a A A

Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment

Posted on:2013-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:L Z YuFull Text:PDF
GTID:2218330374955653Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The human brain could help people identify accurately things around with its own unique function of fusion under some complex environment. Nowadays, with the explosion development of the informaton industry, the speaker recognition technology which can be quite good to imitate human function and even surpass human in some conditions was to be mentioned constantly during the field of pattern recognition. And the speaker recognition in the intelligence meeting environment it is a hot topic of the human-computer interaction. How to take example by the fusion function of human brain, make use of multi-modal fusion technology, and make it to be fused effectively for the corresponding speaker's audio information and visual information to achieve the robust and accurate result of recognition, that has become a hot research during the intelligent information processing.After fully studying and summarizing the basic theory of speaker recognition based-audio, feature extraction of video, multi-model information fusion and the pivotal technology of multi-speaker recognition, it proposed multi-speaker recognition algorithm of feature fusion based on audio-video. And the study is to be separately three sections to accomplish, specifically as follows:Firstly, it presented the improved speaker clustering initialization and GMM multi-speaker recognition. Aiming at the problem of the linear initialization method of multiple speaker clustering with poor accuracy,it proposed an improved method of clustering initialization.The method by introducing BIC to detect and segment for initial cluster produced by the linear initialization,and promoted effectively the purity of speaker initial cluster.Finally,the method is applied to GMM multi-speaker recognition system.Secondly, the motion intensity feature based video signals was introduced, it presented the new algorithm of multi-speaker recognition based on MFCC and Motion Intensity Clustering Initialization. The method used motion intensity feature with each time-frame of visual information to find initial speaker cluster during the process of clustering initialization,maked full use of the correlation of audio-video information and then promoted the purity of speaker initial cluster effectively. But so far, it does not address the real multi-model fusion based audio/video,which established the groundwork for next study. Lastly, this paper presented the algorithm of multiple speaker recognition based upon audio-video feature fusion,which combines the relevance and complementarity at space-time between speaker's speech production and visual motion information,using the audio feature extracted from the microphones and motion intensity feature extracted from video signal to build model of audio stream and video stream respetively, and then applied formula method for two streams to conduct model-level fusion during speaker clusteing phase, getting the corresponding trainning models.And finnaly it is to be used for GMM multi-speaker recognition system.The simulation results show that the above proposed recognition algorithm based on audio-video feature fusion is feasible. During the muliple speaker recognition study, the critical technology is segmentation and clustering for speakers, and the choice of initial cluster affects the overall accuracy of recognition system greatly. In the first two stages the proposed methods of study for speaker clustering initialization could improve the purity of initial cluster effectively, and reduce the recognition error rate of system to some extent. Compared to based on audio-only feature, the speaker recognition algorithm based on audio-video feature fusion by introducing video feature to be used for audio-only system is more robust and with greater improvement, especially under the complex conditions, such sa dynamic meeting environment, overlap speech and so on, and it has better effect of recognition.
Keywords/Search Tags:multi-speaker recognition, information fusion, audio feature, videofeature, motion intensity feature, clustering initialization, GaussianMixture Model
PDF Full Text Request
Related items