Font Size: a A A

A New Non-linear Spectrum Transformation For Speaker Recognition

Posted on:2008-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:D M YuanFull Text:PDF
GTID:2178360218451043Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speaker recognition is to identify or verify who is speaking by analyzing individual information extracted from the speaker's voice. Because of its particular advantage on convenience, economy and extensibility, it can be applied in many fields such as security control, electronic bank, military and forensics. This thesis analyzes the contribution of some existed non-linear spectrum transformation to speaker recognition, and proposes a new non-linear spectrum transformation.Feature selection, detection and speaker modeling are key techniques to speaker recognition system. Now the cepstral coefficients which reflect the vocal response are widely used in speaker recognition, especially the Mel frequency cepstral coefficients because of its acoustic perceptive characteristics. Although it achieves better results, it only represents the speech information but not emphasis on speaker's individual information. The performance of LPC, LPCC and MFCC is first evaluated. Then the performance of three non-linear spectrum transformation, [0] Mel scale, Bark scale and ERB scale[0] frequency transformation are compared for different test time and train data.The contribution of each frequency band to speaker recognition is analyzed according to the human auditory characteristics. The band which is more important to speaker recognition is enhanced and the band which is less important is degraded. Based on above analysis results, the Bark scale is modified with enhancement or degradation of frequency sub-band. Then a new non-linear spectrum transformation and according feature detection algorithm are proposed. The experiment results show that the new non-linear spectrum transformation can improve the performance effectively by comparison of classical non-linear spectrum transformation. In the same condition, the average error rate falls to 0.668 percent. And the error rate trends to zero along with the longer test time. The performance of speaker recognition system is much improved.
Keywords/Search Tags:Speaker Recognition, Non-linear spectrum transform, Vector Quantization, Gaussian Mixture Model
PDF Full Text Request
Related items