Font Size: a A A

The Study Of Speaker Diarization Based On Factor Analysis

Posted on:2017-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2308330485954827Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of computer level and the audio processing technology, there is a growing demand for people to get specific voice of interest from huge amounts of audio data. On the other hand, how to reasonable and effective make various types of audio document manage effectively, is also a major challenge. In this context, in order to meet these requirements, the speaker diarization as a critical technology comes into being, which mainly involves two processes:the speaker segmentation and the speaker clustering.There almost without any priori information to use as reference for speaker diarization system, and it is easily influenced by environment or modeling method, therefore the speaker’s class purity will not be guaranteed through clustering segments. In addition, the hierarchical aggressive clustering method based on distance criterion makes clustering error pass up. So this dissertation presents our works and research on speaker segmentation, clustering modeling method and category purification, the main work and innovations of the dissertation are as follows:Firstly, the research of the front-end voice activity detection (VAD) and class purification. According to the situations of low-energy speech and noise speech in baseline systems, we introduce the method of deep learning and improve the voice activity detection in the front-end of speaker diarization system. Also, the traditional method as Hierarchical Agglomerative Clustering (HAC), can lead to the clustering error layer transfer. To deal with this problem, category purification method based on short-time Bayesian Information Criterion (BIC) is presented, which can weaken the influence of the clustering error of the upward transmission. The experimental results show that, the voice activity detection based on deep learning, can effectively reduce the false alarm and miss alarm when deal with the speaker diarization problem, as a result, it also reduces the speaker diarizaition error rate; Meanwhile, clustering methods based on short-time BIC category purification can correct some wrong speaker clustering segments, which improve purity of back-end speaker clustering process.Secondly, the research of the modeling method of speaker change point detection. In order to improve the accuracy of the division of the speaker, we explored the application of speaker change point detection within Deep Neural Network (DNN) based modeling method, and using its powerful model expression ability. The experimental results show that the modeling method of speaker change point detection using deep learning has played an important role in promotion effect when comparing with the traditional modeling approach using BIC, whether in the detection accuracy and recall rate of the change point, or the speaker error rate of the entire separation system.Thirdly, the research of the speaker clustering modeling method based on factor analysis. When we use traditional Bayesian Information Criterion as the similarity measure for speaker diarization, it can obtain good result in a short dialogue task, however, with the increasing of the length of the conversation, single Gaussian model of BIC is hard to describe the distribution of different speaker data. What’s more, it is difficult to delineate the threshold when distinguish the same speakers and different speakers in the process of hierarchical aggressive clustering. Aiming at this problem, this paper attempts to explore a fusion method based on short-time BIC and long-term Probabilistic Linear Discriminant Analysis (PLDA), which make full use of the BIC reliability in short-term clustering and the excellent discriminating power of PLDA in long-term clustering. The experimental results show that under the framework of factor analysis, the speaker information modeling method can effectively reduce the speaker Diarization Error Rate (DER), and the performances is relatively improved 34.2%.Fourthly, the research of the speaker clustering optimization based on Variational Bayesian (VB) method. In combination with the total variability space under factor analysis framework, we convert the traditional hierarchical clustering method to a soft clustering method which maximum a posteriori probability for a segment belong to a speaker and guarantee the optimization of objective function. This variational Bayesian method can correct some wrong speaker segments when using hierarchical aggressive clustering, improve the back-end initial class purity of PLDA clustering and reduce the speaker separation error rate.
Keywords/Search Tags:Speaker Diarization, Bayesian Information Criterion, Deep Learning, Factor Analysis, Probabilistic Linear Discriminant Analysis, Variational Bayesian
PDF Full Text Request
Related items