Font Size: a A A

SVM Speaker Verification Based On Prosodic Feature

Posted on:2011-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:X Z HuangFull Text:PDF
GTID:2178360308955461Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech signal as effective biological feature, is particularly useful for identification, and text-independent speaker recognition is one of the primary research fields of speech signal processing, and not only of great theoretical significance, but also has a wide variety of applications.The National Institute of Standards and Technology (NIST) has coordinated Speaker Recognition Evaluations since 1996 to investigate and measure the latest approaches. The evaluations represent the state-of-the-art achievements of speech recognition. NIST sets up several tasks to examine speaker recognition performance under different circumstances. NIST offers to the participants telephone and broadcast speech data ranging from multiple channels to various environments, the evaluation specifications, and same evaluation criterion. One task offers long-duration speech from speakers, aiming to make full use of text-independent high level information for recognizing speakers.In addition to short-term spectral features such as MFCC, high level information can also serve as effective feature for speaker recognition, but it usually associated with dependent text. The solution to explore the high level feature for text-independent speaker recognition becomes a focus. The thesis illustrates the effective and easy solution to extract prosodic feature and its models to discriminate speakers.According to the nature of text-independent speaker recognition, conventional probabilistic model GMM-UBM is used for data compression and cluster of prosodic features, and then, Support vector machine(SVM) is used to recognize speakers. The results prove this approach effective.The thesis introduces a method of extracting super-segmental features with wavelet analysis, with which prosodic features of MFCC contour,F0 contour and E contour are extracted. As MFCC is a high dimension case, each dimension has a low correlation to others and the vocal tract convey the slow changes of speech, the approximation coefficients are utilized to form the vocal super-segmental feature PMFCC.F0 contour and E contour prosodic features consist of 6 dimensions respectively. In this way, with wavelet analysis, prosodic features are extracted from MFCC,F0 and energy contours respectively, these complementary features are fused at feature level to yield a most effective feature PMFCCFE,GMM mean super-vectors of PMFCCFE are used to train SVM models to discriminate target speakers and imposters more effectively.The experiments conducted on the 2006 NIST 8side-1side subset show that the prosodic GMM-SVM system relatively improves the performance of the verification system by 57.9% in EER,41.4% in MinDCF, compared with the MFCC-based GMM-UBM system.
Keywords/Search Tags:prosodic features, GMM super-vector, SVM, text-independent speaker verification
PDF Full Text Request
Related items