Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Stream

Posted on:2018-02-10

Degree:Ph.D

Type:Dissertation

University:The University of Texas at Dallas

Candidate:Yu, Chengzhu

Full Text:PDF

GTID:1478390020457697

Subject:Electrical engineering

Abstract/Summary:

With an explosive increase in the amount of multimedia content available worldwide and through the web, automatically detecting who spoke when in an audio stream is an important technique that has many practical applications. The task of automatically annotating speech segments with speaker labels could be considered as either a speaker recognition or speaker diarization problem depending on whether the voice samples of the speakers are available as a priori knowledge. Despite the differences, the success of both speaker recognition and speaker diarization hinge on accurate and robust modeling of speaker voice characteristics. Over the past several decades, the technology of statistical speaker modeling has achieved significant advancements. However, the applications of speaker modeling technology in real world by means of speaker recognition and speaker diarization has considerably limited performance. In this dissertation, we investigate the applications of speaker recognition and speaker diarization on The National Aeronautics and Space Administration (NASA) Apollo-11 mission audio corpus to advance their performance in practical applications. In the first part of this dissertation, we focus on understanding the problems and challenges of applying speaker recognition techniques on a subset of the Apollo-11 space-to-ground audio corpus to automatically recognize all three astronauts. Specifically, we investigate the variations of astronauts voices characteristics across different phases of the lunar mission and their impact on speaker recognition performance. In the second part of this dissertation, we focus on the development of robust speaker recognition and diarization algorithms. We illustrate the challenge of applying speaker diarization techniques on multi-speaker naturalistic audio streams such as Apollo-11 mission control center (MCC) audio corpus, and propose active learning based algorithms to effectively incorporate limited human effort in the current speaker diarization process. Moreover, we propose several robust speaker modeling techniques that improve speaker recognition in generally mismatched or noisy environments. Lastly, the application of speaker recognition and speaker diarization for conversation analysis on the Apollo-11 MCC audio corpus is discussed. This dissertation therefore advances speech and language technology to address diarization of multi-speaker naturalistic audio streams for real task oriented teams. It is expected that these advancements will contribute significantly for research on human-to-human voice interaction for team oriented tasks in business, social, government, and security applications.

Keywords/Search Tags:

Speaker, Audio, Applications

Related items

1	Automatic Segmentation And Clustering Of Multi-genre Audio Method Research And Implementation
2	Speaker Adaptation Technology And Its Key Words In The Telephone Channel Detection System Applications
3	Multi-speaker Tracking Method Based On Audio-visual Feature Fusion Under Intelligent Environment
4	Audio Processing In Content-based Video Retrieval
5	Research On Real-Time Audio Decoding And Robust Speaker Recognition System In Network Environment
6	Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment
7	Usb Soundcard Design
8	The Design Of Multimedia Speaker Audio Part Base On BC5Bluetooth Module
9	End To End Speaker Recognition Under Noisy Environment
10	Speaker Adaptation Techniques Research For Traffic Broadcast Audio Information Retrieval