Font Size: a A A

Research On Speaker Verification System Based On Perceptual Log Area Ratio

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:C YinFull Text:PDF
GTID:2268330401977116Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition, also called "voiceprint recognition", belonging to biometric recognition, is a technology which can identify and verify speaker identity according to one’s utterances. Speaker recognition technology has been used widely, because speech has many features such as versatile, unique, highly available and easy collection. Recently years, with constant improvement of the scientific and technological, application of speaker recognition has obtained rapid development, and become the most important and convenient way of security validation in people’s lives gradually. However, different application fields of the technology put forward higher requirements on it with the continuous development of technology, which makes further development of speaker recognition more difficult. On the one hand, the feature of speaker recognition change as the variation of time and age, and it is also affected by speaker’s mood and state of health. On the other hand, many external factors such as background noises, the length of training and testing data, speech signal distortion resulting from communication channels, imitation speech and interference of the dialect, seriously degrade the real-time performance of speaker recognition system. This dissertation studies speaker feature extraction and noise robustness of speaker verification system based on perceptual log area ratio coefficient.Speaker verification system based on MFCC achieve higher performance in clean conditions, however, it gets worse sharply in noisy environments. The dissertation first extracts perceptual log area ratio, utilizing human auditory perception theory to characterize the speaker individual information, second study its noise robustness. According to the respective discriminative ability, PLAR and MFCC are combined in order to improve the performance of system in noise conditions. The results show that PLAR and MFCC are complimentary with the fusions of two features in the feature domain and the score domain, effectively improving performance of speaker verification systems in noisy conditions.In order to improve the robustness of PLAR in noisy environments, the new feature which called MTPLAR is proposed, by utilizing the multitaper method to substitute for Hamming windowed DFT spectrum to estimate spectrum in front-end of speaker recognition system. Multitaper method uses multiple window functions (or tapers) with weighted frequency-domain averaging to form the spectrum estimate, which can yield more robust feature by providing a robust spectrum estimate. Experimental results show that speaker verification system based on the proposed method has better recognition rate and robust characteristics, compared to the ordinary PLAR.
Keywords/Search Tags:speaker recognition, speaker verification, perceptual log arearatio, multitaper spectrum estimate, feature extraction, robustness, fusion
PDF Full Text Request
Related items