Font Size: a A A

Speech science modeling for automatic accent and dialect classification

Posted on:2008-04-04Degree:Ph.DType:Thesis
University:University of Colorado at BoulderCandidate:Gray, Sharmistha SarkarFull Text:PDF
GTID:2445390005470213Subject:Health Sciences
Abstract/Summary:
This dissertation focuses on addressing automatic speech recognition for detection and classification of non-native accents and dialects. The first thesis goal addresses the necessity of establishing a better understanding of the scientific basis on which accent is encoded and conveyed in human speech communication. To achieve this goal, phonetic aspects are studied for various languages and those traits which are transferred from the first language to a second language (i.e., American English [AE]) are identified. This research is concentrated on how phonological variations of stops in different languages (at word-initial and final positions only) could amplify the perception of accents while speaking AE by non-native speakers. Languages from three broad categories, Aspirated Languages, Voicing Languages and Breathy Languages, are examined. It is observed that speakers with first languages from these language groups have varying aspiration or closure-time length for their word-initial or word-final unvoiced stops, while speaking AE as a second language. These differences, which are measured by Voice Onset Time (VOT) and Stop Closure Time (SCT), are important temporal features in speech perception, speech recognition and accent detection and are generally ignored in fixed-length frame-based speech processing.; The second thesis goal focuses on the formulation of effective algorithms for automatic detection and assessment of accent by eliminating speaker and context dependent characteristics while emphasizing accent sensitive traits. An effective VOT detection scheme, using the non-linear energy tracking algorithm, Teager Energy Operator (TEO) across a sub-band frequency partition for unvoiced stops (/p/, /t/ and /k/) is introduced. The proposed VOT algorithm also incorporates spectral differences in the Voice Onset Region (VOR) and the succeeding vowel of a given stop-vowel cluster to classify speakers of different origins. The spectral cues are enhanced by four types of feature parameter extractions: Discrete Mellin Transform (DMT), Discrete Mellin Fourier Transform (DMFT), Discrete Wavelet Transform using the highest frequency resolutions (DWThfr), and Discrete Wavelet Transform using the lowest frequency resolutions (DWTlfr).; The third thesis goal is to formulate an automatic algorithm for tagging of accent-sensitive and dialect speech corpora, with particular attempts to conversational speech employing multi-dimensional methods in an integrated scheme. Three methods for accent and dialect classification, which either already exist or were newly developed in this thesis, are combined into a weighted scheme to perform automatic text-tagging of dialect and accent sensitive speech corpora. Temporal and spectral based features, such as, Stochastic Trajectory Model (STM), pitch structure, formant location, VOT, and syllable rate are considered in this integrated scheme.; It is noted that the next generation of Spoken Document Retrieval (SDR) systems will require a more diverse set of speech criteria including speaker, accent/dialect, language, stress/emotion and environment content. It is shown that this integrated approach of accent/dialect detection and classification can be successfully applied for rich indexing of historical spoken documents with accent/dialect information. Two examples of rich transcript indexing using material from the world-wide-web for conversational speech are presented as a demonstration of the effectiveness of the proposed methods. Collectively, this dissertation utilizes the differences and similarities in world languages for the development of algorithms for automatic accent and dialect classification. This research will enrich and provide new directions to the next generation human-computer spoken language technology.
Keywords/Search Tags:Accent, Speech, Dialect, Classification, Automatic, Language, Detection, Thesis
Related items