Font Size: a A A

Research On Feature Extraction And Robust Technology For Speaker Identification

Posted on:2010-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P LiFull Text:PDF
GTID:1118360302999489Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Speech is the major source of acquiring information for people, and it is also the most convenient, effective and natural communication tool. Speech recognition is to identify speech contens, the purpose of which is to facilite the exchange of people and machines. Speaker recognition is a special form of speech recognition, which is the use of a machine to recognize a person from a spoken phrase. Speaker recognition technology has made great progress in the near thirty years, at the same time, along with the development of different practical applications, it requires higher performance. On the one hand, the speaker pronunciation variability made that extracting discriminative feature become the key factor of ensuring the system performance. On the other hand, many disturbance factors, such as noise environment, the length of training and testing data and the mismatch of communication channel, seriously degrade the performance of speaker recognition in the practical application. This dissertation focuses on the text-independent speaker identification, including the extraction of speaker characteristic and noise robust. The main research results include four aspects:1.A speaker identification algorithm based on feature transformation and fuzzy least-squares support vector machine is presented to solve the limitation of least-squares support vector machine with large sample of speech data. During the solving process of least-squares support vector machine, it needs to solve a set of linear equations with the number of variables equal to the number of training data, then this paper proposes a method of feature transformation based on Gaussian mixture model.Simultaneously this paper introduces fuzzy membership function into least-support vector machine, which deal with the unclassifiable regions for multi-class problem. GMM is a classical generative model, which can effectively reduce the amount of feature data, and highlight the speaker characteristic owing to that the clustering result is Gaussian mean vectors.The proposed algorithm combines the advantages of generative model and discriminative model.Experimental results demonstrate that fuzzy least-squares support vector machine has better discriminative ability and generalization ability.2.A noise robust method of perceptual feature compensation transformation based on Gaussian mixture model is proposed. From the analysis of human auditory perception, the model of perceptual linear prediction has taken three steps to reflect the human perception of sound. In this paper, it modifies the PLP in the phase of feature extraction via removing the process of critical band spectral resolution analysis, then extracts modified perceptual log area ratio. Furthermore, according to the acoustic characteristic of speaker recognition, it adopts nonlinear transformation for the output likelihood scores, which can widen the score ratio between target model and non-target model, and keep frames'score for the same model close considering the whole distribution of scores.This means that each model score is not only relevant with current likelihood score, but also relevant with the prior K frames'score, which can overcome the limitation of robustness stability under different noise environments for MPLAR feature. The method based on perceptually feature and model compensation can provide discriminative feature, stable the model scores and improve the recognition rate and robustness for recognition system.3.A robust algorithm based on self-adaptive frequency warping is introduced. Although considering the characteristic of human auditory perception and improve the performance of recognition system to some extent, the Mel frequency feature and perceptual linear prediction feature can't treat the semantic information and personality characteristic differently, and pay no attention to high frequency information. This paper presents a new discriminative feature based on adaptive frequency warping. We analyze the relationship between frequency components and individual characteristics and quantify this dependency. This new feature is extracted by non-uniform sub-band filters designed according to the adaptive frequency warping in different frequency bands. Furthermore, we adopt pre-enhancement prior to feature extraction module. Using a series of controlled experiments, it is shown that the warping algorithm is reasonable and understandable, and the proposed feature is insensitive to spoken content and thus more discriminative and robust. The experimental results demonstrate that combining pre-enhancement and proposed feature leads to noticeable improvement on speaker recognition rate and robustness.4.A novel framework of speaker recognition based on Chinese vowel mapping technique is proposed. The base of this framework is the decomposition of Chinese multi-vowel with single-vowel phonemes.In Chinese pronunciation, all syllables have a simple and stable phonetic structure, and the including vowel part holds the main emergy and duration. We find out that the diphthong and multi-vowel in Chinese can approximately be considered as the complex of vowel and transitional part in point of short-term analysis and built up a new Chinese vowel mapping table from multi-vowel to single-vowel phoneme. Based on this mapping table, we succeed in separating personal identification information from semantic information, which is a novel way to transform the text-independent system into text-dependent speaker recognition system and be reusable by industrials or other researchers. In the new framework, we propose a new Chinese speaker identification system based on biomimetic pattern recognition and improve the nearest neighbor algorithm to find the effective cover of each phoneme in the eigen-space for every speaker. During the identification phase, the final decision will be made according to the relation between the cover and the feature characteristic. Experimental results demonstrate that the Chinese vowel mapping theory is valid and meaningful, and the new system can effectively reduce the requirement of data amount and avoid the disturbance of impostors.
Keywords/Search Tags:Speaker Recognition, Speaker Identification, Fuzzy Least-Squares Support Vector Machine, Gaussian Mixture Model, Perceptual Linear Predictive, Robustness, Model Compensation, Self-Adaptive Frequency Warping, Chinese Vowel Mapping
PDF Full Text Request
Related items