Related product Collections Management

Digital Preservation – Interview with Matt Short

An Interview with Matt Short, Metadata Librarian at Northern Illinois University

Mary Burns, Special Collections Catalog Librarian at Northern Illinois University

For 2019-2020, the CARLI Preservation Committee is sharing a series of interviews with preservation managers, conservators, and other library specialists who graciously described their experiences on preservation and conservation topics of interest to CARLI libraries.

This month, Mary Burns, Special Collections Catalog Librarian at Northern Illinois University asked Matt Short, Metadata Librarian at Northern Illinois University, to share some of his experiences with digital preservation and digital preservation policies. The goal of digital preservation is to provide perpetual, uninterrupted access to digital content as technology changes over the course of time.

Interview with Matt Short on Digital Preservation:

What is the Preservation role you play in your institution?

I administer our digital preservation plan and advise our Digital Collections Steering Committee and Preservation Committee on modifications to our digital preservation policy.

Why should a library with a digital collection have a digital preservation policy? Why did you and colleagues undertake writing a digital preservation policy?

At some point, every library will lose data. Whether a lot all at once or a little over time, it will happen. We have had multiple data losses at NIU, both large and small. Losing months of work—and thousands of dollars in staff time—is a great motivator to sit down and think seriously about digital preservation.

A digital preservation policy is not just a plan for creating backups, checking the integrity of your files, and format migration. It also helps to ensure that you have an institutional (and administrative) commitment to the work and costs of preservation. Even if you are a one-person operation, it helps to have everyone onboard with a clear articulation of values and goals.

Finally, having a policy helps when looking for money to fund new projects. Most funding agencies will devote significant space in their proposals to digital preservation or “sustainability.” If you do not have a policy in place, then it may look like a bad investment to the funder. When presenting a policy to administrators, it helps to frame digital preservation in terms of both risks and opportunities. If we do not invest a little money now, not only do we risk losing what we have already built, but we will not be able to bring in big grants in the future.

What are the associated costs of digital preservation?

Staff, software, and storage, but especially storage. We keep a copy of the archival file and a copy of the object with its derivatives in long-term or dark storage. While this space is cheap—only $0.004/GB per month—it does add up. We have a relatively modest collection and our backups sit at around 21 TB, which have an ongoing cost of over $80/month. But if you do not have the staff to install and manage your own preservation system, the costs can be even greater. You may have to contract with a vendor to provide software or service (e.g. Preservica or DuraCloud). 

What are some of the tasks associated with digital preservation and who performs these tasks?

Most of our digital preservation workflows are automated. When a new object is added to the digital repository, each file associated with that object is run through FITS to identify format, size, and other file characteristics, after which a checksum is created. FITS is a free and open source tool that identifies and validates file formats. The checksums are an important unique sequence of letters and numbers contained in the file name. A separate backup job makes a copy of the archival file (for example, the TIFF) and a complete export of the object, including all derivatives, which are temporarily stored in Amazon Web Services S3 (AWS’s S3), then transferred to Glacier for long-term storage. Periodically, a checksum is taken of every file in the repository, then compared against the checksum that has been stored with each file as part of the object. If a mismatch is found, then a copy of the corrupted file is retrieved from storage and used to automatically replace the file in question. I receive notification of the error to confirm that the file was successfully replaced and everything is working as expected. In general, this means that my role is to make sure that these automated processes are working as intended: that backups are running, checksums are being checked, checksum checks are happening, and files are being successfully recovered when an error is found.

Do you have any advice/tips/words to the wise for libraries that are working on developing a digital preservation policy?

The perfect is the enemy of the good. Start small and build out your preservation policy or system from there. A backup of files somewhere is better than nothing at all.

Return to Preservation Interviews: Learning from our Collective Experiences, the homepage of the Preservation Committee's 2019-2020 Annual Project.