Font Size: a A A

The Research On Robustness Of Speaker Recognition

Posted on:2018-06-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:F R YanFull Text:PDF
GTID:1318330518494731Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As one of important biological certification technology, Speaker recognition is widely applied in identity auther acation, man-machine interactive, network information security, scurity and so on. The technology has been achieved good peformance in some special conditions. However, mismatched environments, channels, codec which often expect in the practical applications cause that the performance of the speaker recognition system greatly degrades. The mismatch of acquisition conditions between the training and the test condition is one of the obstacles in the development of speaker recognition technology.This thesis focuses on solving the problem of the reduced performance of speaker recognition system caused by the environment mismatch, the major research work and innovative points include the following aspects:1. Missing data technology is proposed to solve the environment mismatch based on the conclusion that there is redundant information among the data, it can greatly improve the robustness. However, in the case of lower signal-to-noise ratio, the proportion of the destroyed data is larger and the proportion of the observed data is smaller, which inevitably results in a decline of the performance. In order to solve the problem, the thesis considers the principle of the reconstruction to present a new reconstruction method for feature enhancement based on sub-band. When full-band is divided into sub-band, we measure the correlation with the concentration level proposed in the thesis and find that the operation can greatly improve the correlation of the feature vector. Through the analysis,the correlation is used for reconstruction in missing data technology.Therefore, the proposed method can effectively solve the problem that the performance of full-band reconstruction is smaller in the case of a lower signal-to-noise ratio. The proposed method is evaluated in the speaker recognition system in which the feature vectors are randomly removed and the test utterances are mixed by different noise types at various SNRs. The experimental results show that the reconstruction method based on sub-band outperforms full-band reconstruction in terms of recognition performance and the effect is more obvious in the case of a larger proportion of missing data. The method is a feature-based method and do not depend on the model. Therefore, it can be integrated in different recognition system and the application is extensive.2?Feature warping technology is one of feature normalization technology and it maps feature vectors to other domains to solve the problem of the decline recognition performance caused by the environment and channel mismatch. The key of determining the performance of the technology is the relative position of the feature of a sliding window. The relative position can be called ranking feature. Through the theoretical analysis and the experimental results, the relative positions are changed due to the non-linear effect of noise. Aiming at the problem, this thesis proposes a feature enhancement method based on ranking feature. The presented method takes the advantage of the superior of ranking feature. It eliminates the non-linear effect of noise in a certain extent and solves the problem that the performance of feature warping technology is smaller in the case of unstable noise. It not only improves the performance of the system in the case of mismatch but also effectively avoids the influence of nonlinear characteristics of noise to feature warping. The proposed feature enhancement approach is evaluated in an open-set speaker recognition system and the experimental results show that it can improve the performance of speaker recognition in the mismatch condition.Furthermore, the method proposed here is a feature-based method which may be applied in speaker or speech recognition systems which are based on many models to enhance the robustness of speaker or speech recognition system.With the wide application of digital voice communication technologies, the speech is encoded with compression for effective storage and transmission. However, the low-bit rate codec destroyes the structure and statistical distribution of the feature vector. Therefore, the performance of speaker recognition system dramatically reduces.Aiming at the problem of the codec mismatch, the major research work and innovative points include the following aspects:1?Feature distortion caused by different codec schemes is different.The case results that the model trained in a codec sheme can not well describe a test utterance in another codec sheme. When the codec of the training and the test is different, Universal Background Model (UBM) can not describe the test utterance, I-vector extracted from the UBM is not accurate due to the model distortion. Aiming at the problem, the thesis proposes a compensation method based on model distortion. The method firstly derives the model distortion from feature distortion and then adapts the trained model in real time and makes the codec format of the training utterance be match with the test utterance. After that, a more robustness I-vector is extracted based on the adapted model. The proposed method firstly identifies the quantitative changes of the model due to the feature distortion and then compensates for the model in the process of estimating the distribution of the coding-decoding distortion. The results show that the proposed method is able to dramatically reduce the effect of the coding-decoding distortion and improve the recognition performance of the system.In addition, the method needs not consider the codec type of the test utterance and the computational cost is small.2?In order to reduce the adverse impact of codec mismatch on the trained model, the thesis proposes a compensation method based on an integrated model. The method firstly establishes an integrated model of the feature of the uncoded speech and coding-decoding distortion feature to describe the feature of the code-decoded speech on the basis of the above work and then utilizes a testing utterance and the integrated model to adapt the trained model in real time to better match the codec format of the test utterance. After that, a more robustness I-vector is extracted. The experimental results show that the proposed method can greatly improve the performance of recognition system in the condition of the codec mismatch especially in a lower code rate. What is more, the method only requires one adapting utterance and is suitable for the system in which the number of the utterance is small. Finally, the method needs no information about the codec format of the test speech and can be used in real-time systems.
Keywords/Search Tags:speaker recognition, noise, coding-decoding, mismatch, robustness
PDF Full Text Request
Related items