Font Size: a A A

Speech repairs, intonational boundaries and discourse markers: Modeling speakers' utterances in spoken dialog

Posted on:1998-02-10Degree:Ph.DType:Thesis
University:University of RochesterCandidate:Heeman, Peter AnthonyFull Text:PDF
GTID:2465390014974767Subject:Speech communication
Abstract/Summary:
Interactive spoken dialog provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker's intended utterances: both segmenting a speaker's turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where the speaker goes back and changes (or repeats) something she just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. The two problems of segmenting the turn into utterances and resolving speech repairs are strongly intertwined with a third problem: identifying discourse markers. Lexical items that can function as discourse markers, such as "well" and "okay," are ambiguous as to whether they are introducing an utterance unit, signaling a speech repair, or are simply part of the context of an utterance, as in "that's okay." Spoken dialog systems need to address these three issues together and early on in the processing stream. In fact, just as these three issues are closely intertwined with each other, they are also intertwined with identifying the syntactic role or part-of-speech (POS) of each word and the speech recognition problem of predicting the next word given the previous words.; In this thesis, we present a statistical language model for resolving these issues. Rather than finding the best word interpretation for an acoustic signal, we redefine the speech recognition problem to so that it also identifies the POS tags, discourse markers, speech repairs and intonational phrase endings (a major cue in determining utterance units). Adding these extra elements to the speech recognition problem actually allows it to better predict the words involved, since we are able to make use of the predictions of boundary tones, discourse markers and speech repairs to better account for what word will occur next. Furthermore, we can take advantage of acoustic information, such as silence information, which tends to co-occur with speech repairs and intonational phrase endings, that current language models can only regard as noise in the acoustic signal. The output of this language model is a much fuller account of the speaker's turn, with part-of-speech assigned to each word, intonation phrase endings and discourse markers identified, and speech repairs detected and corrected. In fact, the identification of the intonational phrase endings, discourse markers, and resolution of the speech repairs allows the speech recognizer to model the speaker's utterances, rather than simply the words involved, and thus it can return a more meaningful analysis of the speaker's turn for later processing.
Keywords/Search Tags:Speech repairs, Discourse markers, Spoken, Utterances, Speaker's turn, Intonational, Word, Simply
Related items