Font Size: a A A

Objectively measured descriptors for perceptual characterization of speakers

Posted on:2000-05-23Degree:Ph.DType:Thesis
University:Georgia Institute of TechnologyCandidate:Necioglu, Burhan FazilFull Text:PDF
GTID:2468390014465715Subject:Engineering
Abstract/Summary:
Speaker recognizability has long been identified as one component in the evaluation process of communications systems. Although the intelligibility and voice quality aspects of evaluation have taken relative precedence, with more widespread use of lower bit rate speech coders, speaker recognizability emerges as an additional major issue. Still, subjective testing of speaker recognizability is intricate, time consuming and very expensive; so potentially, using objectively measurable descriptors to augment the subjective speaker recognizability tests could result in increased efficiency and reliability. This thesis presents a variety of descriptors objectively extracted from the speech waveform that might be useful in characterizing and interpreting perceptual speaker differences. These descriptors belong to the three broad classes of prosodic, vocal tract and glottal properties of speech production, and include various measurements on pitch and energy contours, formant related statistics, average vocal tract length estimates, and glottal pulse parameters. To assess the potential for this large set of speech waveform descriptors, reliability, RMS measurement noise and strength of speaker clustering were estimated using sets of 86 male and 78 female TIMIT speakers. The actual speaker discrimination abilities of the descriptors were determined by maximum-likelihood same/different classification of speaker pairs using their utterance pair measurement distances, without the need to model individual speakers. Using pairs of utterances approximately 12 seconds in length, and combining the likelihood scores of ten descriptors from all three broad classes, it was possible to make zero same-speaker classification errors, while achieving a different-speaker classification error rate of less than 1%, on separate testing/training speaker sets. When utterance lengths were reduced by half, the average error rate stayed below 4%. The perceptual relevance of this set of descriptors was investigated using the perceptual spaces obtained by multidimensional scaling on the averaged subjective dissimilarity judgments between speaker utterance pairs drawn from sets of ten male and ten female speakers, with a large collection of recorded sentences. Highly statistically significant rank-order correlations were detected between all the dimensions of the multidimensional scaling solutions representing the perceptual spaces of male and female speakers, and several prosodic, vocal tract and glottal descriptors.
Keywords/Search Tags:Speaker, Descriptors, Perceptual, Vocal tract, Objectively
Related items