Font Size: a A A

Factor Analysis For Text-independent Speaker Identification Method

Posted on:2015-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiuFull Text:PDF
GTID:2268330431450063Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The mechanism of identifying the target speaker, which utilizes utterance instead of semantic information, is denoted as text-independent speaker identification. Due to the non-contact interaction, the emergency of speech recognition has opened a new era of biometric authentication. With the increasing complexity of the network and transmission channel, conventional methods have delivered poor performance. Therefore the robustness has become a hotspot in speaker recognition recently.Factor analysis is mainly to extract the compact representations of speakers which refer to as i-vectors, or total variability factors. I-vector, originally proposed by Dehak in2010, was motivated by joint factor analysis (JFA). Moreover, i-vector only defines one space which is termed a total variability space. Thus a given utterance can be represented by a low-dimensional vector in that space. It has been proved that factor analysis can effectively solve the problem of non-match between training and testing environment during the experiment.In order to obtain i-vector, the first step is to construct the Gaussian mixture model (GMM) of each speaker. The combination of LBG algorithm and fuzzy theory is applied to initialize the GMM. Given that total variability space does contain channel information, channel compensation whose goal is to separate the characteristics of speaker from that of the channels is inevitable. The channel compensation techniques contain linear discriminant analysis (LDA), principal component analysis (PCA), within class covariance normalization (WCCN), and nuisance attribute projection (NAP). The comparison of different compensation techniques is showed in the thesis, and a variety of identification methods including vector quantization, logarithmic likelihood, support vector machine and cosine distance scoring are discussed as well. The result of the experiments demonstrates that LDA followed by WCCN may achieve the satisfying performance. Cosine distance scoring with score normalization not only provides better result, but also makes the decision process less complex. Finally, the graphical user interface of training and testing module is simulated in the dissertation.
Keywords/Search Tags:speaker identification, i-vector, channel compensation, support vectormachine, cosine distance scoring
PDF Full Text Request
Related items