Related product Collections Management

Archival Preservation 101: Preservation of Digital Material

Tonia Grafakos, Marie A. Quinlan Director of Preservation, Northwestern University

Libraries and archives often find themselves accessioning collections containing disparate materials. Collections come into institutions containing papers, photographs, ephemera, hard drives, compact disks, telephones, websites, email, and social media accounts. Many institutions are well versed in handling physical collections but are often unsure how to best preserve and store digital collections. Tonia Grafakos, Director of Preservation, asked Kelsey O’Connell, Digital Archivist at Northwestern University Libraries a series of questions meant to shed light on the role of a Digital Archivist and how institutions can preserve their digital collections.

Interview with Kelsey O’Connell:

How would you describe the work of a digital archivist?

A digital archivist specializes in preserving, managing, processing, and providing access to archival and historical documents that are in digital formats. Some digital archivists only specialize in managing born-digital items in the collections and some include responsibilities for managing the digitized assets of the collections too. Like a traditional archivist, they need to be able to steward the materials from donation into preservation and implement various standards and activities on the data to help preserve it throughout its life cycle and make it accessible to patrons. Depending on the needs of the workplace, the digital archivist should be able to work with donors to help them transfer their digital files to the institution, manage the accessioning, processing, security, digital preservation, access to the collections, and provide instruction on how to use and research with digital archival materials. Since technology is continuously evolving, a digital archivist also needs to be able to design, amend, and update workflows, policies, and protocols relating to the digital collections. And they need to assess, implement, and use various specialized hardware and software that helps manage digital collections.

What do you find the most challenging part of your job?

A digital archivist needs to be able to provide consultation and education on misconceptions and assumptions about digital files held by both donors and patrons. This can be challenging due to the variety of misconceptions and assumptions that may be brought to a digital archivist and donors’ or patrons’ expectations about the life cycle of digital files based on these beliefs. And because technologies and the ways we use them are constantly evolving, it can further exacerbate or create new misconceptions.

For instance, if something is published on the web or saved in a cloud drive, there are misconceptions that those methods of storage and publishing are preservation, and no other steps need to be taken. This may prevent donors from wanting to officially transfer their digital files to an archival repository under the assumption that the files are already accessible online. A digital archivist needs to tactfully educate the donors about the limitations of web publishing and storage to help them understand why formal digital preservation through an archival repository is a safer option without inundating them with jargon or too many details. Therefore, it is necessary to gauge donors’ or patrons’ perceptions and their actual comprehension of digital literacy in order to provide them with the appropriate recommendations and assistance.

Are there different preservation standards for different digital media types such as PDFs, Word documents, MP3, or MOV files?

It will depend on the needs of your institution, but typically, different file formats do not need different standards of preservation. Refer to the NDSA’s Level of Digital Preservation to access a foundational framework for helping design the digital preservation actions most relevant to your institution’s abilities and needs. At minimum, it is helpful to have all your file formats documented so you know what you have in your collections. This inventory then informs what standard and/or specialized digital preservation actions to take on files.

When preserving files, you should maintain a preservation copy (formerly referred to as the master copy) of the file and a derivative (or access) copy of each file. The derivative copy of the file is what is shared with researchers to reduce risks of loss, corruption, and/or alteration to the original file (i.e., preservation copy). The derivative is generated from the preservation copy and can be a copy of the exact same file format as the original, or it can be migrated to a new format for access and preservation. For instance, a text document originally created as a WordPerfect file may need to be migrated to a new format, such as an Open Office document or a PDF to be more widely accessible and usable in current or open software solutions. Whereas a TIF image, which can also be rendered in many image software programs, can be used as both a preservation and derivative copy format; but you may still want to migrate it to a JPG since it is a compressed format which would take up less storage space.

You will need to assess your institution's needs for preserving and accessing the files, which is why an inventory of all your file formats is a great place to begin developing those protocols..

How do institutions archive websites and social media?

One method to archive social media is getting your donor to export a copy of their social media accounts and transfer those files to you. Your donor can log into their personal account on social media, locate the “export my data” functions in their account settings, and make selections about which files to export. Users may choose to export the entirety of their social media profile, or only some elements. For instance, when exporting account data from Facebook, users can choose whether they want to export only their profile page, their uploaded content, and/or content they have been tagged in by other users. If the donor is still actively using and updating their social media accounts, you may need to arrange recurring transfers of these social media accounts to cover future gaps in the collection.

Similarly, donors can export copies of their personal websites depending on what web hosting provider they are using and how they upload and access the files used to create the website. If the donor is able to export the entirety of all the code, style sheets, and content in its original folder organization, then a snapshot of the website can be saved and can be accessed in a web browser. Again, if the donor is regularly updating or changing the website, you can either set up recurring transfers of the site or use a web crawler.

If exporting personal copies of social media accounts or websites is not an option or wouldn't work for your institution, utilizing web crawling technologies will provide a comprehensive manner to capture multiple snapshots of these sites. Tools like WebRecorder/Conifer, Archive-It, or running your own web crawler will allow for this functionality. This allows archivists to set up recurring schedules to capture sites so that periodic snapshots of the sites are captured over time, allowing users to see the variations of the site. See the International Internet Preservation Consortium's (IIPC) resources on web archiving for tips and tool suggestions.

What can be done to extract data from obsolete technology?

Implementing digital forensics hardware and software are some of the most efficient, in-house ways to extract data from obsolete technology. Hardware, such as external drives, data connectors, or obsolete data ports, can help access data on removable media such as floppy disks, SD media cards, or hard disk drives, by connecting the media to modern computers. Various digital forensics software is available for purchase, for free, or for open-source implementation. For instance, COPTR maintains an inventory of various tools for forensics applications. A commonly implemented tool in libraries and archives is the BitCurator suite of tools as it packages various digital forensics and digital processing tools into one environment. This is a virtual machine that runs in Linux, so it is highly adaptable to your own needs and can be a tool for accessing Mac-created files if you only use Windows PCs in your workspace. While there is a learning curves to the environment, the BitCurator Consortium maintains and provides a lot of resources and training. Another commonly installed tool is FTK Imager; while it has less functionality than other tools, it is typically an easier tool to use quickly as its graphic-user interface (GUI) can be simpler to learn than other command-line interface (CLI) tools.

Additionally, there are various media and digitization companies you can employ to provide these reformatting and data recovery services. Measuring your institution’s staffing, computing resources, and expertise against the cost of outsourcing is helpful to determine whether it is more or less feasible to hire an external company to complete this work. When choosing a company, you will want to consider their prices, as well as the types of transfer methods used, your metadata needs, quality control methods, file formats, and any other technical needs relevant to your collections and workflows.

What can people do to help ensure that their digital material is properly preserved?

To help protect your data, make back-up copies and audit them regularly. The concept of “lots of copies keeps stuff safe” is still relevant today even as many of us utilize third-party cloud services for personal and professional data storage. While these third-party cloud providers, such as Apple, Google, Amazon, or Microsoft, are much more secure and reliable than previous versions, they are still officially owned by some other entity and not by you. Therefore, if you are using vendors for digital storage of any kind and have the means, then verify your terms with a contract between your organization and the storage provider to have an exit strategy to retrieve your data should the vendor is unable to continue to store your assets.

For both personal storage and commercially contracted digital storage, the 3-2-1 rule of digital preservation is still highly applicable and helpful: 3 copies of the files, on at least 2 different storage mediums, and in at least 1 different disaster risk geographical region. Digital preservation can be scalable and iterative to meet the needs of your organization; remember that doing something is better than nothing and that perfect (preservation) is the enemy of good.

What is a common misconception surrounding digital archiving?
A common misconception is assuming that because something is digital, it is easier to manage. Donors and researchers may expect that a newly acquired set of born-digital files can be made available without issue. But the backlog for accessioning and processing digital files can take just as long, if not longer, as print/physical archival records. This is due to any additional digital forensics and/or preservation actions needed for the materials and the dependence on computer and network speeds to transfer files across systems into preservation storage. In addition, there are very few online search portals that can easily accommodate all born-digital file format types in ways that mimic the print archival researcher experience. Therefore, accessing born-digital files may be counter intuitive or cumbersome.
Are there any resources you would recommend for those interested in learning more about archiving born digital material?

For a primer, The No-Nonsense Guide to Born-Digital Content by Heather Ryan and Walker Sampson (ISBN: 9781783302567) provides a comprehensive introduction to understanding, collecting, preserving, and providing access to born-digital archives.

The Council on Library & Information Resources (CLIR) has published several reports that are relevant to managing born-digital records including: Digital Forensics and Born-Digital Content in Cultural Heritage Collections by Matthew G. Kirschenbaum et al., and Born Digital: Guidance for Donors, Dealers, and Archival Repositories, by Gabriela Redwine et al.

There are various frameworks available to help design your own digital preservation and born-digital program including: the Digital Processing Framework by Erin Faulder et al., the Levels of Born-Digital Access by the DLF Born-Digital Access Working Group, and the NDSA’s Levels of Digital Preservation.

Additionally, there are many organizations that create and maintain standards for managing born-digital and web archives. Checking with these organizations to see what they have published or are researching will provide insight to various aspects of digital preservation and born-digital management. See organizations such as, BitCurator Consortium, Digital Library Federation, Council on Library & Information Resources, Software Preservation Network, International Internet Preservation Consortium, National Digital Stewardship Alliance, and Society of American Archivists.

Return to Archival Preservation 101