Font Size: a A A

Preserving long-term access to United States government documents in legacy digital formats

Posted on:2011-07-07Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Woods, Kam AFull Text:PDF
GTID:1448390002450274Subject:Information Science
Abstract/Summary:
Over the past several decades, millions of digital objects of significant scientific, economic, cultural, and historic value have been published and distributed to libraries and archives on removable media. Providing long-term access to these documents, media files, and software executables is an increasingly complex task because of dependencies on aging or legacy hardware and software. This is a persistent problem for both digital libraries and long-term digital archives, where mandates to maintain and improve access can be overshadowed by ongoing technical and administrative costs associated with digital collections.;There are several widely accepted techniques used by the archival community to preserve materials originally held on legacy media: bitstream preservation, migration of documents from aging formats to modern ones, and emulation for legacy executables. I demonstrate how these techniques can be combined to provide high-quality access to digital collections without compromising long-term archival processes or increasing risk. I show that most technical risk to preserving and accessing legacy born-digital documents can be effectively managed through the careful application of existing open source tools paired with some custom software. I focus on the collection of Government Printing Office documents held on legacy optical and magnetic removable media at the Indiana University Libraries. This collection contains millions of born-digital objects (documents and software) in hundreds of formats.;I present a systematic approach to transferring bit-identical filesystems from legacy media to modern storage, ensuring future operation within legacy environments and supporting integrity checks and deduplication tasks. I describe reliable, high-performance techniques for automated identification, feature extraction, migration, rendering, and distribution of the documents and software contained in this collection. I examine methods that exemplify best practices for providingWeb access to digital collections, including high-performance indexing, generation of and access to machine- and human-readable metadata, on-demand migration and rendition of legacy documents, and the construction of a "virtual filesystem" to simplify navigation of the digital archive. Finally, I examine the relationship between these techniques and the development of quantifiable measures of risk for legacy digital objects.
Keywords/Search Tags:Digital, Legacy, Documents, Access, Objects, Long-term, Techniques
Related items