NotesFAQContact Us
Search Tips
ERIC Number: ED523367
Record Type: Non-Journal
Publication Date: 2010
Pages: 181
Abstractor: As Provided
Reference Count: N/A
ISBN: ISBN-978-1-1243-5568-9
Preserving Long-Term Access to United States Government Documents in Legacy Digital Formats
Woods, Kam A.
ProQuest LLC, Ph.D. Dissertation, Indiana University
Over the past several decades, millions of digital objects of significant scientific, economic, cultural, and historic value have been published and distributed to libraries and archives on removable media. Providing long-term access to these documents, media files, and software executables is an increasingly complex task because of dependencies on aging or legacy hardware and software. This is a persistent problem for both digital libraries and long-term digital archives, where mandates to maintain and improve access can be overshadowed by ongoing technical and administrative costs associated with digital collections. There are several widely accepted techniques used by the archival community to preserve materials originally held on legacy media: bitstream preservation, migration of documents from aging formats to modern ones, and emulation for legacy executables. I demonstrate how these techniques can be combined to provide high-quality access to digital collections without compromising long-term archival processes or increasing risk. I show that most technical risk to preserving and accessing legacy born-digital documents can be effectively managed through the careful application of existing open source tools paired with some custom software. I focus on the collection of Government Printing Office documents held on legacy optical and magnetic removable media at the Indiana University Libraries. This collection contains millions of born-digital objects (documents and software) in hundreds of formats. I present a systematic approach to transferring bit-identical file-systems from legacy media to modern storage, ensuring future operation within legacy environments and supporting integrity checks and deduplication tasks. I describe reliable, high-performance techniques for automated identification, feature extraction, migration, rendering, and distribution of the documents and software contained in this collection. I examine methods that exemplify best practices for providing Web access to digital collections, including high-performance indexing, generation of and access to machine- and human-readable metadata, on-demand migration and rendition of legacy documents, and the construction of a "virtual file-system" to simplify navigation of the digital archive. Finally, I examine the relationship between these techniques and the development of quantifiable measures of risk for legacy digital objects. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: Higher Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Indiana; United States