Font Size: a A A

Study On Speech Feature Extraction Algorithm In Speaker Recognition System

Posted on:2010-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1118360272996208Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As a kind of biometric identification technology, speaker recognition is to recognize people's identity from its voice, which contains physiological and behavioral characteristics specific to each individual. One significant use of speaker recognition is to determine whether a speaker has the right to enter security or confidential systems. Using speech password has advantages the traditional way by inputting password on keyboard doesn't have, for it is unforgettable and cannot be easily taken. Speaker recognition technology is a very promising area of research.Most speaker recognition systems are designed for ideal environment and easily acquired high accuracy in controlled quiet lab situation. However, when a speaker recognition system is used in a real-life situation, there is bound to be a mismatch between training and testing. The background noise to cause the performance of system accuracy decrease sharply. This is the major obstacle to the commercial use of speaker recognition system. So, how to increase the robustness of speaker recognition system is significant and necessary. The thesis focus on how to improve the recognition ratio and robustness of speaker recognition system by several aspects. The main innovation ideas of the dissertation are listed as follows.1. An endpoint detection algorithm that combines expanded spectral subtraction with the SAP (speech absence probability) dynamic threshold is proposed based on traditional methods. The algorithm employs a method of expanded spectral subtraction based on the noise compensation structure, which can estimate the noise during speech presence. A method of endpoint detection based on the SAP soft decision is given, which improves robustness and precision of endpoint detection. The experiments show that better performance can be obtained even if SNR is equal to -10dB whereas such performance cannot be achieved by traditional two-doors methods with the same SNR.2. Pitch detection is one of the most difficult technologies in speech signal processing under noisy conditions. A new pitch detection of noisy speech signal for lower SNR is proposed, which is based on Reverse CAMDF Autocorrelation Function (RCAF) and searching tentative smooth measurement. The algorithm can estimate noise during speech presence, which employs the method of expanded spectral subtraction based on noise compensation structure. RCAF algorithm improves the robustness and precision of pitch detection. A number of experiments show that by RCAF method, higher efficiency and better detection accuracy can be obtained while the SNR is equal to -10dB. However, such performance can not be achieved by traditional methods, AMDF, CAMDF and AWAC under the same SNR.3. Auditory filter plays an important role in understanding the mechanism of hearing, auditory modeling and speaker recognition. Digital implementations of linear gammatone and Gammachirp filters are regularly part of auditory models and can be used in the sound processing in cochlear implants. This paper mainly studied on Gammatone and Gammachirp auditory filter, including their definition, amplitude-frequency response, and performance in simulating the basilar membrane filtering characteristics. Besides, the paper also compared the two auditory filters, explaining their relation and difference. How close digital impulse, magnitude, and phase responses match the corresponding properties of the analog gammatone and Gammachirp filters were evaluated for two infinite-impulse response filter designs. The gammachirp filter was implemented with a small number of filter coefficients using IIR filter. The result shows that the combination of a gammatone filter and an IIR asymmetric compensation filter excellently approximated the gammachirp filter.4. An auditory based feature extraction algorithm was developed to improve the recognition performance of speaker identification algorithms using human auditory characteristics. The sub-band energies of the extracted auditory features were calculated using Gammatone and Gammachirp filter bank instead of the commonly used triangle filter bank. The center frequencies and bandwidths then determined according to the equivalent rectangular bandwidth (ERB). The proposed method was compared with two commonly used techniques; LPCC and MFCC in a text-independent speaker identification system. The simulation results prove that the two proposed features outperform the widely used MFCC and LPCC and perform more robust to noisy environment with low environmental SNR level.5. For the defect of high-dimensional human auditory features, using two methods to extract low-dimensional features of the speaker is in order to reduce the computational complexity. The two methods of Multivariate Statistieal Analysis are: Prineipal Component Analysis (PCA) and Discrete Cosine Transform (DCT). And the first and second order delta cepstrum and the shifted delta cepstrum is derived based on these auditory features. Compared to the standard Mel-frequency cepstral coefficients, the auditory features yielded higher recognition rate in a speaker recognition system. Also the feature set has better classification and robustness characteristics than traditional speech features.
Keywords/Search Tags:Speech signal processing, expanded spectral subtraction, reverse CAMDF autocorrelation function (RCAF), auditory filter model, Gammatone filter, Gammachirp filter, dimension compressing
PDF Full Text Request
Related items