Font Size: a A A

Amelioration de la robustesse des systemes de reconnaissance automatique du locuteur dans l'espace des i-vecteurs

Posted on:2015-11-07Degree:D.EngType:Thesis
University:Ecole de Technologie Superieure (Canada)Candidate:Senoussaoui, MohammedFull Text:PDF
GTID:2478390017493357Subject:Computer Engineering
Abstract/Summary:
Most of the current speaker recognition systems adopt the representation of speech in the ivector space. An i-vector is a simple vector of small dimension (typically in the hundreds) representing a wide range of information carried by the voice signal. Although the performance of these systems in terms of the recognition rates have achieved a very high level, a better exploitation of these systems in the real daily environments still requires more efforts from the researchers in this field.;In this thesis, our main objective is to improve the robustness of speaker recognition systems operating in i-vector space. In the first part of this work, we focus on the task of speaker verification. We focus especially in the design of a verification system independent of channel (transmission / recording) type as well as of speaker gender. In the context of i-vector representation, the generative classifiers, such as the Probabilistic Linear Discriminant Analysis (PLDA), have predominated the field of speaker recognition. However, the simple classifier based on the cosine distance (CD) remains competitive. Thus, we propose two solutions making systems based on both classifiers of the state of the art (PLDA and CD) independent respectively of the channel type and of speaker gender. In fact, our systems designed in this way are considered as the first two systems of speaker verification achieving the state of the art results (around 2% of EER for telephone speech and 3% for microphone speech) without taking advantage of information neither about channel type nor speaker gender.;Speaker clustering is another task of speaker recognition discipline, which is of interest in the second part of this thesis. Again, our research will be conducted only in the context of the representation of speech in i-vector space. Actually, there are two types of application, namely, speaker clustering of large corpora and speaker diarization of audio streams. In fact, a new version of the non-parametric Mean Shift algorithm (MS) has been proposed in this thesis in order to tackle the problem of speaker clustering. We have demonstrated that our new version of the MS algorithm based on the cosine distance performs better than the baseline version, once tested on the task of speaker clustering. Furthermore, this same algorithm has enabled us to obtain the diarization state of the art results (DER equal to 12.4%) when tested on the telephone speech of the CallHome data.
Keywords/Search Tags:Speaker, Speech, Space, Systems, I-vector
Related items