Font Size: a A A

Robust speaker clustering under variation in data characteristics

Posted on:2010-10-16Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Han, Kyu JeongFull Text:PDF
GTID:1448390002481647Subject:Engineering
Abstract/Summary:
Speaker clustering refers to a process of classifying a set of input speech data (or speech segments) by a speaker identity in an unsupervised way, based on the similarity of speaker-specific characteristics between the data. The process identifies the speech segments of the same speaker source without any prior speaker-specific information of the given input data. This speaker-perspective, unsupervised classification of speech data can be applied as a pre-processing step to speech/speaker recognition or multimedia data segmentation/classification in various ways. Thus, speaker clustering has been recently attracting much attention in the research area of speech recognition and multimedia data processing.One big, yet unsolved, issue in the research field of speaker clustering is unreliable clustering performance under the variation of input speech data. In this dissertation, we deal with this problem in the framework of agglomerative hierarchical speaker clustering (AHSC) in two perspectives: stopping point estimation and inter-cluster distance measurement. In order to improve the robustness of stopping point estimation for AHSC under the variation of input speech data, we propose a new statistical measure called information change rate (ICR), which can improve estimation of the optimal stopping point. The ICR-based stopping point estimation method is not only empirically but also theoretically verified to be more robust to the variation of input speech data than the conventional BIC-based method. In order to improve the robustness of intercluster distance measurement for AHSC under the variation of input speech data, we also propose selective AHSC and incremental Gaussian mixture cluster modeling These two approaches are proven to provide much more reliability for speaker clustering performance under the variation of input speech data.Based on these results on robust speaker clustering under the variation of input speech data, we extend our interest to implementing a more robust speaker diarization system to the variation of input audio data. (Speaker diarization refers to an automated process that can annotate a given audio source in terms of "who spoke when".) Focusing on speaker diarization of meeting conversations speech, we propose two refinement schemes to further improve the reliability of speaker clustering performance in the framework of speaker diarization under the variation of input audio data. One is selection of representative speech segments and the other is interaction pattern modeling between meeting participants, and both of them are experimentally verified to enhance the reliability of speaker clustering performance and hence improve the overall diarization accuracy under the variation of input audio data.
Keywords/Search Tags:Speaker clustering, Data, Variation, Stopping point estimation, Improve, Diarization, AHSC
Related items