Font Size: a A A

Research On Feature Extraction Algorithm In Speaker Recognition

Posted on:2017-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:W J HuangFull Text:PDF
GTID:2358330512467948Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speaker Recognition utilizes the different voice features between the different speakers to discriminate them, which is involved with the fields of physiology, acoustics and phonetics. Compared to iris recognition and fingerprint recognition, it is more simple and convenient. In speaker recognition system, the key problem is to extract the feature parameters of speakers exactly.The Mel Frequency Cepstral Coefficient (MFCC) was used in this paper. MFCC is analyzed on the basis of human auditory mechanism, which reflects the actual hearing effect of the ear. In the aspect of recognition model, the support vector machine (SVM) was selected because of its great advantage in pattern recognition problems of small samples and nonlinear. The following contents were researched in this paper according to the extracting process of MFCC parameter and the nonlinear characteristic of voice:(1) In view of the possible impact that window function in the Mel filter bank has on the recognition effect, the triangle window, harming window and hamming window were used to design the Mel filter bank and the result of simulation experiments showed that utilizing the hamming window to design the Mel filter bank can obtain better recognition performance.(2) Compared to Fourier transform, the wavelet analysis has great advantage in processing the nonlinear and non-stationary signal. On the basis of principle of wavelet transform and the corresponding relationship between nodes of wavelet packet decomposition tree and frequency band range of signal, the new feature parameter wavelet packet transform coefficient (WPTC) was obtained. Simulation experiments indicated that the recognition performance of new feature parameter WPTC is much better than MFCC.(3) The traditional MFCC feature parameter does not reflect the nonlinear characteristics of speech signal. The empirical mode decomposition (EMD) method was used to isolate the high frequency part of speech signal and the fractal dimension (FD) was utilized to express the nonlinear characteristics of high frequency information, then the characteristic parameter EMD-FD was obtained. The fusion of traditional MFCC and EMD-FD constituted a higher dimensional feature space. Simulation experiments indicated that the average recognition rate of parameter which is confused with the nonlinear feature was improved about 2%, when compared to MFCC.
Keywords/Search Tags:speaker recognition, Mel Frequency Cepstral Coefficient, wavelet packet transform, Empirical Mode Decomposition, Fractal Dimension
PDF Full Text Request
Related items