Robust speaker clustering under variation in data characteristics

Posted on:2010-10-16

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Han, Kyu Jeong

Full Text:PDF

GTID:1448390002481647

Subject:Engineering

Abstract/Summary:

Speaker clustering refers to a process of classifying a set of input speech data (or speech segments) by a speaker identity in an unsupervised way, based on the similarity of speaker-specific characteristics between the data. The process identifies the speech segments of the same speaker source without any prior speaker-specific information of the given input data. This speaker-perspective, unsupervised classification of speech data can be applied as a pre-processing step to speech/speaker recognition or multimedia data segmentation/classification in various ways. Thus, speaker clustering has been recently attracting much attention in the research area of speech recognition and multimedia data processing.One big, yet unsolved, issue in the research field of speaker clustering is unreliable clustering performance under the variation of input speech data. In this dissertation, we deal with this problem in the framework of agglomerative hierarchical speaker clustering (AHSC) in two perspectives: stopping point estimation and inter-cluster distance measurement. In order to improve the robustness of stopping point estimation for AHSC under the variation of input speech data, we propose a new statistical measure called information change rate (ICR), which can improve estimation of the optimal stopping point. The ICR-based stopping point estimation method is not only empirically but also theoretically verified to be more robust to the variation of input speech data than the conventional BIC-based method. In order to improve the robustness of intercluster distance measurement for AHSC under the variation of input speech data, we also propose selective AHSC and incremental Gaussian mixture cluster modeling These two approaches are proven to provide much more reliability for speaker clustering performance under the variation of input speech data.Based on these results on robust speaker clustering under the variation of input speech data, we extend our interest to implementing a more robust speaker diarization system to the variation of input audio data. (Speaker diarization refers to an automated process that can annotate a given audio source in terms of "who spoke when".) Focusing on speaker diarization of meeting conversations speech, we propose two refinement schemes to further improve the reliability of speaker clustering performance in the framework of speaker diarization under the variation of input audio data. One is selection of representative speech segments and the other is interaction pattern modeling between meeting participants, and both of them are experimentally verified to enhance the reliability of speaker clustering performance and hence improve the overall diarization accuracy under the variation of input audio data.

Keywords/Search Tags:

Speaker clustering, Data, Variation, Stopping point estimation, Improve, Diarization, AHSC

Related items

1	Design And Implementation Of Speaker Diarization System
2	Research On Speaker Log System Based On Bayesian Method
3	The Modeling Research In Speaker Diarization
4	A Study On Speaker Diarization Based On Multiple Features
5	The Design And Implication Of Speaker Clustering Method
6	Research And Implementation Of Key Technology In Speaker Diarization System
7	Research On Speaker Diarization Based On Deep Learning
8	Speaker Diarization: Current Limitations and New Directions
9	Voiceprint Identification System For The Applications Of Satisfaction Telephone Interviews Cheating Investigation
10	The Study Of Speaker Diarization Based On Factor Analysis