INFO 284: Tools, Services, and Methodologies for Digital Curation

Fall 2021

Course Description

Overview of the tools, services, and methodologies used to manage data and digital objects throughout their lifecycle. Students will be introduced to trusted digital repositories and gain experience with tools and services such as DROID and Archive-It.

Learning outcomes

  1. Conceptualize and plan the creation and storage of digital data and objects.

  2. Determine specifications for a trusted digital repository or a digital archives/preservation service.

  3. Develop a migration plan for a digital collection to different formats.


Portfolio

Through the course readings and three core projects, I became familiar with the challenges in digital curation around preservation, access, file management, and was introduced to the tools and methodologies for developing a trusted digital repository.

Designing a File Preservation Strategy

Project link

The project introduced DROID, PARADIGM methodology, MIME-types, and methods of discovering embedded metadata. Through the exercise of analyzing file formats, I am now able to recognize and recommend preservation-friendly formats and manage problems with obsolete and proprietary formats. Management includes strategies such as migration or emulation. Most importantly, the project offered the opportunity to develop a comprehensive preservation strategy.

The Trusted Digital Repository (TDR) case study

Project link

The case study evaluated an institutional example of a trusted digital repository (TDR) in compliance with the OAIS reference model. I selected Stanford University and its use of dSpace as a digital repository. The study introduced me to the complexity and costs of developing and managing a digital repository, especially one as heavily customized as Stanford’s. Additionally, their Self-Deposit service is a fantastic example of leveraging repository systems to meet the end-to-end best practices of a TDR.

Archive-It: web archiving study

Project link

Preserving the wild frontier of web content seems impossible under the increasing deluge of digital content. This culminating team project provided a framework for curating an archived collection of online content while exploring rights management, file integrity, provenance and authenticity, digital preservation, and metadata application issues.


Professional Application

The Preservation Strategy project has since helped me considerably as a digital archivist specializing in media collections. I have knowledgeably implemented best practices for creating preservation-grade image file formats per Library of Congress and FADGI recommendations. I have knowledgeably engaged in discussions over encoding and transcoding issues in video digitization projects. A recent example includes why HEVC H.265, while a better file, is problematic due to licensing and low acceptance in favor of H.264. Knowing the differences in file formats and codecs has been a critical issue in user access and researching digital asset management system (DAMS) capabilities.

Using DROID and checksums to evaluate files impressed upon me the power of digital curation tools. Manually evaluating a network drive of 100,000+ files for image files is costly and cannot effectively surface duplicate, corrupt, or compromised files.

Learning about Stanford’s process for facilitating content creator submissions has recently led me to consider how this process might apply to managing creator submissions of digital video content into Panopto. The current overarching question, inspired by Stanford, is what kind of framework and workflow can be developed to ensure easy file and metadata submission while adhering to security and digital preservation policies?

Additionally, understanding Stanford’s customization of dSpace and its stacked structure has since been invaluable for researching digital asset management systems for a research archive. Open source and stacked systems potentially have security vulnerabilities for a classified archive. However, most out-of-the-box DAMS also rely on open-source components to complete their closed stack, such as SOLR. Knowledgeably identifying problems in a system’s architecture is critical for establishing a TDR. Stanford’s LOCKSS philosophy has since led me to think more carefully about the process of archiving digital preservation files. This includes the benefits of keeping an access, master, and two preservation files.