Digital Treasure Trove: Resolving Duplicate and Mystery Files

Contributed by PAFA Museum Collections

We are nearing the end of the weeks-long process of renaming the files in our digital collection. As mentioned before, this work was important for establishing a file naming convention/schema to improve its usefulness and accessibility both internally and externally. Doing so is one of many parts of our IMLS grant project, and so far, has included a combination of automated computer scripting via Python, and slowly combing through the filenames by hand.

With this combination of tools, we were able to confidently correct the names of approximately 16,000 files, which was a huge step forward for the project and PAFA’s collection team. During this process, however, we identified over 1,000 files that were either unnecessary duplicates or were files that did not carry with it enough information to adequately identify them in our database.

These duplicate and mystery files arose from the old folder structures that we are no longer using and have recently phased out as part of this file renaming process. Formerly, things could hide and get copied, moved, updated, and renamed without any meaningful way to ensure that outdated or unnecessary files are removed. In the process of migrating files to fewer sub-directories, which use only their unique accession number to identify them, it became immediately clear which files need review and correction, beyond a simple renaming.

Once the duplicates were double checked, they were easily discarded. The unknown files, however, we had no way of knowing how important any given file might be. Out of a commitment to thoroughness the collections team decided to work together to identify several hundred of these files, often by finding some physical works in our storage vault that could be related. Only after reasonable certainty could be reached would we know for sure how to rename (and then either keep or discard) these mystery files. This process alone took around two weeks but has resulted in a repaired digital archive that is ready for the introduction of our new high-resolution photography from this year.

About the Institute of Museum and Library Services

The Institute of Museum and Library Services is the primary source of federal support for the nation’s libraries and museums. We advance, support, and empower America’s museums, libraries, and related organizations through grantmaking, research, and policy development. Our vision is a nation where museums and libraries work together to transform the lives of individuals and communities. To learn more, visit follow us on Facebook and Twitter.

Leave a Reply

Your email address will not be published. Required fields are marked *