Font Size: a A A

Research Of Speaker Identification Models Based On Kernel Methods

Posted on:2011-10-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W ZhengFull Text:PDF
GTID:1118330338977932Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Due to its special merits of flexibility, accuracy and economy, speaker recognition technology has been regarded as the most natural kind of biometrics, which has comprehensive perspective of applications in the field of security access, forensics evaluation, electronic sniff, financial services. Recently the speaker recognition system research has turned from theory to practice and people demands more and more with the change of circumstances. Seeking the higher recognition rate will never be the only criterion. The real-time quality can not be neglected as well as the convenience and expandability of the system model.In recent ten years, many classification algorithms were proposed based on the kernel function, which effectively solved the drawbacks of local minimum and incomplete statistical analysis of the traditional pattern recognition model. These new algorithms always have super power of nonlinear capacity which can meet the speech feature's demanding. So speaker recognition systems based on kernel method like the support vector machine have been proved to be very successful.In this thesis, we focus on the improvement of the model domain and propose different kinds of kernel classification method which can be applied to the task of speaker identification in the circumstance of small sample speech corpus. The main contributions of the work are as follows:1. Provided the analysis of the leading speaker recognition model, GMM-UBM and SVM. The generative model GMM is always the baseline technology for last decade, but it needs too many input speech data. The GMM-UBM can reduce the amount of the input data for target and has better effect than the GMM. The discriminative model SVM has lots of merits including maximum classification margin, global solution and sparsity. When applied to the small sample speaker recognition system, it has even better result than the GMM-UBM. We deeply analyzed the principle and performances and mixture strategy and application details of these models. The last experiments show that GMM-UBM's test speed is low and SVM has poor expandability for multi-class classification. 2. Proposed the hybrid strategy of GMM and RVM for speaker identification. Relevance vector machine classification method uses the probabilistic output to overcome SVM's shortage as well as has more sparsity. Whereas RVM has overloaded computation complexity and memory storage when applied for the text-independent speaker identification because of the mass training samples. For solving this problem, a hybrid GMM/RVM approach is proposed which can effectively extract the speaker feature vector as well as solve the mass storage problem. Further more; this hybrid approach combines the robustness of generative model and the powerful classification of discriminative model to improve the performance and robustness of identification.The experiments prove that this method has better error rate than the GMM system and more sparsity than state-of- the-art GMM/SVM system.3. Proposed the multi-class kernel logistic regression speaker identification model. The traditional logistic regression model is transformed to multi-class kernel logistic model applying for text-independent speaker identification, which is nonlinear and more than just two classes. The L 2 penalty factor is added for enhancing model generalization ability. Then a new iterative algorithm is proposed based on the solution of a dual problem using ideas similar to those of the Sequential Minimal Optimization algorithm for SVM. Experiments show that the algorithm is robust and fast and the recognition rate is as good as widely used methods such as SVM while being used in text-independent speaker identification.4. Proposed the probabilistic sparse kernel logistic speaker identification model. A true sparse multiclass formulation was introduced based on multinomial logistic regression which incorporates weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. Then the bottom-up training algorithm is adopted which controls the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization and faster computation. Experimental results on standard benchmark data sets and speaker identification show the proposed method has best real-time ability.5. Proposed the local within-class features preserving kernel fisher discriminant algorithm and applied in speaker identification. Dimensionality reduction without losing intrinsic information on original data is an important technique for succeeding tasks such as classification. A novel algorithm is proposed after deeply analysis on the relationship between kernel fisher discriminant and kernel local preservation projection. The new method keeps the ability of KFD's global projection and introduces the local preservation ability of LPP, which can work well on overlapped or multimodal labeled data. The training algorithm is improved for resolving out-of-memory problem when applied in large sample situation. The speaker identification application shows that the proposed algorithm has more adaptability as well as advanced recognition rate and speed.6. Proposed the enhanced data domain description speaker identification method. The purpose of data description is to give a compact description of the target data that represents most of its characteristics. In a support vector data description (SVDD), the compact description of target data is given in a hyperspherical model, which is determined by a small portion of data called support vectors. Despite the usefulness of the conventional SVDD, however, it may not identify the optimal solution of target description especially when the support vectors do not have the overall characteristics of the target data. To address the issue in SVDD methodology, the enhanced SVDD is proposed introducing new distance measurements based on the notion of a relative density degree for each data point in order to reflect the distribution of a given data set. Experiments are made for comparison between GMM and the proposed enhanced SVDD because they both can apply for open-set speaker identification, the results show that the enhanced SVDD outperforms the GMM whenever in recognition rate, real-time ability and sample demanding.
Keywords/Search Tags:Speaker Identification, Kernel Trick, Relevance Vector Machine, Kernel Logistic Regression, Kernel Fisher Discriminant Analysis, Support Vector Data Domain Description
PDF Full Text Request
Related items