Small Archives Creating Descriptive Metadata from Scratch

Small Archives Creating Descriptive Metadata from Scratch
Rebecca Skirvin, Coordinator of Archives and Special Collections at North Central College
May 2017

An archivist working without additional staff to create descriptive metadata for digital collections faces a dilemma: how do you adequately describe your collections in a way that will make sense to outsiders (and particularly metadata aggregators) without spending a large amount of time on the process? In my first year as archivist at North Central College, I've experienced a variation on this dilemma. I inherited a pre-existing archives of mixed digital surrogates and born digital objects that had been created on demand and to which metadata had not been attached systematically. For digitization projects going forward, I've decided on metadata standards, tools, and workflows that will ensure the capture of metadata in a way that promotes shareability and emphasizes processes that can be performed in stages on the fly. For the purposes of this case study, I'm restricting myself to digital still images, though these processes could be adapted to audio recordings and video.

My digital image collections are for the most part created in one of two ways: photographs and documents in the North Central College Archives' analog collections are scanned on command to create digital surrogates, and born digital photographs are harvested from Flickr albums maintained by North Central's Office of Marketing and Communications and Sports Information Director. Since the aggregator I'm most likely to deal with is CARLI Digital Collections, I consulted their metadata standards and best practices as I planned the workflows and decided on the standards and tools described below. Like many one-person shops, I'm operating without a formal content repository, but I do have dedicated server space for storing digital collections.

Generally speaking, I prefer to embed as much descriptive metadata as possible into digital images so that the metadata can move along with the image and so that I can search digital images easily to retrieve them in the future. However, not all descriptive metadata fields that are available through Windows File Properties or IPTC (the easiest embedded metadata to use that I have found) map well to Dublin Core standards (the preferred standard for CARLI Digital Collections). So I supplement the embedded metadata with a spreadsheet for all Dublin Core fields required for CARLI Digital Collections using file names as unique identifiers. (Incidentally, this requires me to use good naming practices with respect to file names!) It should also go without saying that it's important to follow the best practices for creating shareable metadata outlined in CARLI's Guidelines for the Creation of Digital Collections: Best Practices for Descriptive Metadata.

For scan on demand projects, I want to capture important descriptive metadata at the point of creation (or soon after) and create a master 300 dpi .tif. Scanning as a .tif ensures that I don’t have to re-scan the photo should another request come through and that I can supply an image for ingest into a repository. However, the immediate focus is on fulfilling the patron request. To save time, I have mapped as many required Dublin Core elements to the Windows File Properties fields as possible, so that I can open the Properties window right after the scan is created and fill in the pertinent information. I also save images to folders named after the photo collections in the Archives (such as Media-Photographs and OPI-Topical Photographs) so that I can add that information into the Relation field on the master spreadsheet. If I am scanning more than a few images, I use a metadata editing tool such as Photo Mechanic or Adobe Bridge to enter the IPTC information. After I add the IPTC metadata, I can create an access copy for patrons, and fill in the full Dublin Core metadata on the spreadsheet at a later date. It is also possible to export metadata from Photo Mechanic or Adobe Lightroom (using a third-party plugin) into a .csv file, which can then be filled into the Dublin Core metadata spreadsheet.

Dealing with photosets from Flickr has proven to be a bit trickier for a couple of reasons. Flickr exports files as .jpgs instead of the preferred .tifs, which raises issues that are outside the scope of this case study. I use the bulk downloader FlickrDownloadr, which will download the largest, "original" files and will incorporate important metadata (album title—which often includes the event title and date—and original upload date as well as the download date) into the folder and file names. It also preserves technical and descriptive metadata embedded in the image by the camera and photographer before upload to Flickr. It does preserve tags applied to the photograph after upload to Flickr, but these are exported as JSON sidecar files and not embedded in the photograph. These tags, besides the album titles, are often the only bits of descriptive metadata available for these photographs, so it is important to ensure that they are preserved. I am still working out the best way to do this; I am downloading more than seven years' worth of photos from Flickr, so embedding the tags manually is not feasible. One option is to write a Python script to read the JSON files and copy their contents to a .csv (which requires me to brush up on my coding); another is to approach the task strategically and embed the tags as photos are needed for patrons or for sharing with CARLI Digital Collections.

It is possible for someone working in a one-person shop to create shareable metadata from scratch; it just requires a lot of planning and thinking ahead. As I enter metadata into the spreadsheet, for example, I will need to keep in mind the preferred formats for dates and times. The other likely repository that I would share metadata with is the Illinois Digital Heritage Hub (IDHH), and I am doing as much as I can with my descriptive metadata to ensure it can be read and interpreted correctly for both CARLI and IDHH. I have included a list of metadata elements I use Windows File Properties and IPTC to embed in digital still images, as well as the Dublin Core elements I use in the spreadsheet. As for the digital images I inherited, I am creating shareable metadata for them on a case-by-case basis; because they are nearly all .jpgs, and because it is not always easy to identify which part of the collection the originals came from, it is often more desirable to return to the original image and scan as a .tif than try to go back and add metadata.

Windows File Properties/IPTC Embedded Fields (Dublin Core element mapped to); required fields in bold:

  • Title/Description/Caption (DC: Title)
  • Source (DC: Relation)
  • Copyright (DC: Rights)
  • Headline (DC: Description)
  • Authors/Creators/Photographer (DC: Creator)
  • Time and Date Event Location (DC: Date) **using recommended YYYY-MM-DD
  • Keyword/Structured Keyword (DC: Subject) **using controlled vocabulary

Additional Dublin Core Spreadsheet Fields:

  • Publisher
  • Language (required if applicable)
  • Identifier **file name – so file names must be unique
  • Format
  • Type **DCMI