Font Size: a A A

Partial Least Squares Based Total Variability Space Modeling Of I-VECTOR For Speaker Recognition

Posted on:2016-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2308330479490099Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one key technique for multimedia data analysis, speaker recognition can be widely used in access control, transaction authentication, law enforcement,speech data management, and audio monitoring, etc. As an effective representation for different length of speech signals, i-vector has drawn considerable attentions in speaker recognition, and has achieved better performance than traditional methods. Training a total variability space is one of the most important parts in the i-vector method. However, the conventional training way only uses the relationship between speaker features but ignores apriori category information of speakers, which is meaningful for prediction and classification. As a consequence, the neglect leads to a suboptimal total variability space.In this paper, we propose a new approach to model the total variability space by introducing the speaker a-prior category information with the partial least squares(PLS). We first train the Gaussian mixture model-universal background model(GMM-UBM) and extract the speaker GMM supervectors from it, then estimate the total variability space by using the relationship between speaker features and their category information and then extract i-vectors from it, at last we use a channel compensation technique called within-class covariance normalization(WCCN) to remove the nuisance direction and use the cosine distance scoring(CDS) for decision. We evaluate our method on the King-ASR-009 dataset, NIST SRE 2008 short2-short3 set, and8conv-short3 set. The experimental results show that the proposed method can achieve a better performance than traditional methods.Due to the partial least squares is sensitive to between-class data but not to abnormal data, when training models often leads to a decline in performance of the system. In this paper, we propose a regression punishment partial least squares based approach to model the total variability space. We divide the dataset into two parts, one for training the initial total variability space, and the other for regression punishment. As the experimental results show, theKing-ASR-009 dataset can achieve a better performance than traditional methods of speaker verification and identification.
Keywords/Search Tags:speaker recognition, i-vector, total variability space, category information, partial least squares
PDF Full Text Request
Related items