Font Size: a A A

Research On Key Techniques For SVM-based Speaker Verification

Posted on:2012-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H LongFull Text:PDF
GTID:1488303338474074Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of techniques in speaker recognition, more and more researchers start to emphasize their efforts on each details of speaker recog-nition systems, to make the improved performance close to real practical appli-cations. Recently, discriminative and representative front-end feature extraction, channel compensation methods under the adverse conditions and system fusion approaches attract more and more attentions. In this context, this thesis focuses on the most key techniques in speaker verification to achieve a highly competi-tive system. It presents an in-depth and systematic research in the SVM-based speaker verification system, and. demonstrates our innovations in model training, channel compensation, system fusion and feature extraction to greatly improve our systems'performance.Firstly, we build a baseline system based on the GMM supervector-Support Vector Machine(GSV-SVM), and make lots of analysis and modifications for the whole speaker verification framework. From quantities of experimental results, we?ind that the data imbalance between target and imposter speakers led to a large performance reduction in GSV-SVM system. Two approaches called SMD (Speaker Model Distance) and SVRT (Support Vector Retraining) are pro-posed in this thesis to solve the data imbalance problem. The SMD approach is to choose those imposter samples with larger similarities with the target speakers according the distance between speaker models. However, the SVRT method is to choose those imposter samples from lots of support vectors with larger dis-criminative information based on SVM model training. Our experimental results validates that both of the two proposed methods outperforms the random selection of SVM imposter samples.Secondly, this thesis provides several improvements on the Nuisance At- tribute Projection (NAP) to alleviate the channel distortions /variabilities un-der complicated speaker verification conditions. Meanwhile, a Session Variation Principal Component Analysis-SVPCA. integrated the Within-Class Covariance Normalization-WCCN algorithm is proposed here to enhance the channel robust-ness of our system. This proposed algorithm not only utilizes the channel label information provided by the development data, but also fully exploits the speaker identity information resided in the data, to reduce the mismatch between training and testing. Experiments on NIST SRE evaluation tasks present the effectiveness and significant improvements in performance.Thirdly, this thesis proposed a new speaker verification system which is based on the high-level prosodic features. Prosodic features is first effectively extracted. Then we use the negative within-class covariance normalization(WCCN) to pro-vide a robust representation for our prosodic features and apply the support vector regression(SVR) to achieve a good generalization and approximation in speaker modeling. Finally, a new feature combination method called the segmental weight fusion(SWF) has been proposed to effectively combine the acoustic and prosodic informations in different score regions for a more reliable fusion system.Finally, this thesis provides a new set of features based on the spectral sub-band energy extracted from the harmonic and noise speech parts which are de-composed by the Harmonic plus Noise Model (HNM). These new features include the Spectral Subband Energy Ratios-SSERs, the Harmonic Spectral Subband Energy-HSSE and the Noise Spectral Subband Energy-NSSE. Meanwhile, a Pitch Synchronous-Energy VAD post processing method is proposed here to improve the performance of all these features. In order to examine the effectiveness of the new features, our experiments have been done on the 863 Putonghua Corpus which was recorded under clean environments with little channel distortions. Pre-liminary results have shown that the proposed features significantly outperform the conventional MFCC features.
Keywords/Search Tags:Speaker Verification, GMM mean Supervector, Support Vector Ma-, chine, Channel Compensation, Prosodic Feature, Spectral Subband Energy
PDF Full Text Request
Related items