Font Size: a A A

Study On Static Individual Characteristics For Speaker Recognition

Posted on:2020-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2518306518963579Subject:Software engineering
Abstract/Summary:PDF Full Text Request
A question in the current speaker recognition tasks is the use of a stereotypical convenient feature set with no clear views on how speaker characteristics are generated by humans.Several feature extraction methods applied in speaker recognition is adopted from speech recognition task directly.Although most of them can work well in practice,they are often used as“black boxes”without exact interpretation.The feature extractor is the first component in an automatic speaker recognition system,and it determines the rest of the speaker recognition system.A more adequate feature set should be sought to highlight the frequency regions that contain individual speaker characteristics.Therefore,we aim at exploring the causal mechanisms of individual speaker characteristics,focusing more on the anatomically based static features.This thesis has two main purposes as stated in the following.For one thing,we attempt to explore the relationship between types of speaker characteristics and their frequency regions for both static and dynamic aspects by conducting speaker verification tests and analyzing the causal mechanisms of individual speaker characteristics.Our experience from such work suggests us to hypothesize that the static and dynamic characteristics strongly appear in speech signals in the high-frequency and low-frequency regions,respectively.In order to test the hypothesis,we employ two types of filterbank shapes(high-emphasis,and low-emphasis on the linear frequency scale)and subband filterbank as feature extractors,and we conducted speaker verification tests with the TIMIT database.The results indicate that the performance with the high-emphasis filterbank is better than that with the low-emphasis one for the static characteristics.Thus,our hypothesis was supported to account for the static characteristics in the higher frequency region.On the other hand,this thesis examines how gender-specific characteristics contribute to the performance of automatic speaker verification(ASV).We review genderspecific anatomical variations and explore the roles of gender-related feature components in ASV by setting the following hypothesis:(hypothesis 2)Static speaker characteristics exist in gender-specific frequency regions,which can be tested by examining the gender effect on the performance of ASV.In order to test the hypothesis,both vocal tract features(sub-band coefficients)and voice source features(F0,HNR,H1*-H2* ratio)were extracted based on anatomical knowledge.Then,the F-ratio values were calculated for each gender,and speaker verification tests were conducted using a gender-balanced modified TIMIT database.The results indicate that the most discriminative frequency region is higher and wider in female than in male.Further,the results also provide a support for our hypothesis because of the clear gender effects on ASV performance for both vocal tract and voice source components.Although certain complexities(e.g.,noise robustness,or channel mismatch)remain as critical issues in real applications for speaker recognition,it is outside the scope of this thesis to be expanded in the future.
Keywords/Search Tags:Individual Characteristic, Speaker Recognition, Speech Production, Frequency Distribution, Static and Dynamic, Gender-specific
PDF Full Text Request
Related items