Font Size: a A A

Advancements in robust algorithm formulation for dialect and speaker recognition

Posted on:2012-02-25Degree:Ph.DType:Thesis
University:The University of Texas at DallasCandidate:Lei, YunFull Text:PDF
GTID:2458390011451378Subject:Engineering
Abstract/Summary:
The speech signal is comprised of many levels of information in addition to the text content itself, such as speaker information (e.g., dialect/accent, gender, emotion, age, identity) and environment information (e.g., channel, background noise, room conditions). This thesis focuses on the identification of two important factors in the speech signal, which include automatic dialect classification and automatic speaker recognition.;This thesis proposes two novel algorithms to improve dialect classification for text-independent spontaneous speech in both Arabic and Spanish languages, along with probe results for Chinese. The algorithms are formulated using the Kullback-Leibler divergence based mixture selection in the training phase and frame selection decoding in the testing phase under a Gaussian mixture model based framework. The major motivation of both algorithms is to suppress confused/distractive regions from the dialect language space and emphasize discriminative/sensitive information from the available dialects. In addition, since the difference among the dialects is very subtle, the performance is more sensitive to mismatches from other components in the speech signal. To compensate for mismatch and focus on the intrinsic dialect properties itself, the well-know factor analysis based mismatch compensation approach is used and extended to compensate for the various distortions (e.g., gender, speaker, and channel) in dialect identification so that only dialect information is emphasized, thereby improving overall performance.;The second thesis goal addresses the problem of the speaker recognition, where factor analysis, as one of the most important techniques, is widely used in model training and channel compensation. The correlation between speaker and distortion (e.g., channel and additive noise) is analyzed and modeled. A resulting simplified version of the model is then used to fit the factor analysis approach under a joint factor analysis framework, since factor analysis has been proven to be very effective for performance improvement. Next, in order to avoid the approximation of simplification in the joint factor analysis framework, the total variability model is studied and a new supervised approach is proposed to reserve more speaker specific information than the total variability model which is an unsupervised probabilistic principle components analysis approach. In addition, the combination of the proposed supervised and traditional unsupervised approaches is proposed and evaluated. Evaluations are performed on the NIST SRE-2008.;This thesis has therefore contributed to improved modeling and classification strategies for dialect/accent, as well as speaker recognition, based on leveraging discriminative knowledge which is learned during modeling. Such advancements will ultimately contribute to improve speech processing and language technology solutions.
Keywords/Search Tags:Speaker, Dialect, Speech, Factor analysis, Information, Model
Related items