Font Size: a A A

Research On Subspace-based Speaker Adaptation Techniques

Posted on:2015-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:X K YangFull Text:PDF
GTID:2308330482979164Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speaker adaptation techniques improve the Continuous Speech Recognition (CSR) system performance by using adaptation data to adjust the acoustic model or features and compensate the mismatch between training and testing conditions. Subspace-based speaker adaptation techniques cause more and more interesting because these techniques are suitable for the situation which only limited adaptation data is available. This dissertation is focused on developing practical subspace-based adaptation methods. The main content is as follows:For Eigenvoices, the typical one of subspace-based adaptation methods, it may lead to over-fitting and descend the system performance when the adaptation data is very little. To solve this problem, the regularized Eigenvoices adaptation method is presented. A new objective function is constructed by introducing an appropriate regularization term in the likelihood function. After optimizing the objective function, a better speaker factor is obtained, which is more robust than that obtained by the original method, especially in case of insufficient adaptation data. Experimental results of language recognition on NIST LRE 2003 show that the regularization method improves the performance compared with that of the baseline system. The shorter the recognition segments, the more significant the performance improvement. And mandarin Chinese CSR experimental results on Microsoft speech database show that the regularization method will reduce the system performance when the adaptation data is enough. However, when the adaptation data is limited, the regularization method can improve the robustness, effectively.Eigenvoices adaptation belongs to linear subspace methods which cannot explorer the intrinsic structure of nonlinear subspace. Orthogonal Laplacian adaptation method proposed in this dissertation can solve this problem. In this method, we use Orthogonal Locality Preserving Projection (OLPP) algorithm to adjust the eigenspace derived from Eigenvoices algorithm to improve the local preserving power while preserving the acoustics information as much as possible. Both system framework and implementation steps for language recognition and CSR are given. Language recognition experiments on the NIST LRE 2003 evaluation corpus show that this new approach can make the feature more discriminative. And CSR experiments on Microsoft speech database give more verification that the new method is better than Eigenvoices adaptation method.A speaker adaptation method in feature level named feature-space Eigenvoices adaptation method is proposed. In this method, similar to RATZ, the information of speakers in the feature space is modeled by a Gaussian mixture model. Moreover, the number of parameters need to be estimated is descended by taking the dependency of these parameters into account. This method can use very little data to construct a more accurate feature space model. Experimental results of CSR on Microsoft speech database show that this method can still achieve good performance even when the adaptation data is limited. And speaker adaptive train based on this method can further descend the word error rate.
Keywords/Search Tags:Continuous Speech Recognition, Speaker Adaptation, Manifold Learning, Eigenvoice, Orthogonal Locality Preserving Projection, Regularization Method, Feature Normalization
PDF Full Text Request
Related items