Automatic language identification with sequences of language-independent phoneme clusters

Posted on:1997-03-13

Degree:Ph.D

Type:Thesis

University:Oregon Graduate Institute of Science and Technology

Candidate:Berkling, Kay Margarethe

Full Text:PDF

GTID:2468390014980237

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Automatic Language Identification involves analyzing language-specific features in speech to determine the language of an utterance without regard to topic, speaker or length of speech. Although much progress has been made in recent years, language identification systems have not been built on detailed underlying theory or linguistically meaningful design criteria. This thesis is motivated by the belief that features used to discriminate between languages should be linguistically sound; the result is a unique combination of design, theory and implementation.; In this thesis a "word-spotting" algorithm is introduced motivated by a perceptual study (82) reporting that human subjects use language-dependent phonemes and short sequences to identify languages. In order to find an optimal set of phoneme-like tokens to represent speech in a linguistically meaningful way, a mathematical model of the discrimination between two languages is developed. This model permits the automatic design of a token representation of speech by selecting a list of discriminating "words" in a data-driven manner. The resulting system has the flexibility to automatically take into account the inherent structure of the Languages to be discriminated. A second mathematical model is developed to measure the impact of inaccurate automatic alignment of tokens on language discrimination. This model indicates why some algorithms aiming to compensate for these inaccuracies have not been successful. The theoretical models and the "word"-spotting algorithm have been implemented and validated on both generated and real-world speech data.; This dissertation makes several significant contributions: the design of a simple and linguistically sound language-identification module; a flexible automatic feature extraction algorithm; a mathematical model to estimate the discriminability of two languages; and a mathematical model to capture the impact of inaccurate alignment on the discriminability of two languages.

Keywords/Search Tags:

Language, Automatic, Mathematical model, Speech

PDF Full Text Request

Related items

1	Application Research On Statistical Language Model Of Large Vocabulary Continuous Speech Recognition System
2	Mongolian Language Model Based On Recurrent Neural Network
3	Automatic dialect classification: Advances for read and spontaneous speech, and printed text
4	Researching Of The Mogolian Language Model Based On Speech Recognition
5	Application Of Mathematical Morphology In Speech Signal Processing
6	Research On Automatic Segmentation Technology And Automatic Segmentation Of Speech In Dai Language Speech Synthesis System
7	Design And Implementation Of Speech Recognition System Based On DNN-LSTM
8	Research And Application Of Speech Recognition Based On Conformer
9	Chineses Speech Recognition System Based On CLDNN Hybrid Model
10	Research And Implementation Of Mathematical Expression Handwriting Recognition Technology Based On LSTM Model