| The proliferation of mobile computing has provided much of the world with the ability to record any sound of interest, or possibly every sound heard in a lifetime. The technology to continuously record the auditory world has applications in surveillance, biological monitoring of non-human animal sounds, and urban planning. Unfortunately, the ability to record anything has led to an audio data deluge, where there are more recordings than time to listen. Thus, access to these archives depends on efficient techniques for segmentation (determining where sound events begin and end), indexing (storing sufficient information with each event to distinguish it from other events), and retrieval (searching for and finding desired events). While many such techniques have been developed for speech and music sounds, the environmental and natural sounds that compose the majority of our aural world are often overlooked.;The process of analyzing audio signals typically begins with the process of acoustic feature extraction where a frame of raw audio (e.g., 50 milliseconds) is converted into a feature vector summarizing the audio content. In this dissertation, a dynamic Bayesian network (DBN) is used to monitor changes in acoustic features in order to determine the segmentation of continuously recorded audio signals. Experiments demonstrate effective segmentation performance on test sets of environmental sounds recorded in both indoor and outdoor environments.;Once segmented, every sound event is indexed with a probabilistic model, summarizing the evolution of acoustic features over the course of the event. Indexed sound events are then retrieved from the database using different query modalities. Two important query types are sound queries (query-by-example) and semantic queries (query-by-text). By treating each sound event and semantic concept in the database as a node in an undirected graph, a hybrid (content/semantic) network structure is developed. This hybrid network can retrieve audio information from sound or text queries, and can automatically annotate an unlabeled sound event with an appropriate semantic description. Successful retrieval of environmental sounds from the hybrid network using several test databases is demonstrated and quantified in terms of standard information retrieval metrics. |