“Old Books” photo by flickr user Iguana Joe, used by permission (CC-by-nc)
“Old Books”
photo by flickr user Iguana Joe, used by permission (CC-by-nc)

Earlier this week, the Harvard Library announced its new open metadata policy, which was approved by the Library Board earlier this year, along with an initial two metadata releases. The policy is straightforward:

The Harvard Library provides open access to library metadata, subject to legal and privacy factors. In particular, the Library makes available its own catalog metadata under appropriate broad use licenses. The Library Board is responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and modifying it as necessary.

The first releases under the policy include the metadata in the DASH repository. Though this metadata has been available through open APIs since early in the repository’s history, the open metadata policy makes clear the open licensing terms that the data is provided under.

The release of a huge percentage of the Harvard Library’s bibliographic metadata for its holdings is likely to have much bigger impact. We’ve provided 12 million records — the vast majority of Harvard’s bibliographic data — describing Harvard’s library holdings in MARC format under a CC0 license that requests adherence to a set of community norms that I think are quite reasonable, primarily calling for attribution to Harvard and our major partners in the release, OCLC and the Library of Congress.

OCLC in particular has praised the effort, saying it “furthers [Harvard's] mandate from their Library Board and Faculty to make as much of their metadata as possible available through open access in order to support learning and research, to disseminate knowledge and to foster innovation and aligns with the very public and established commitment that Harvard has made to open access for scholarly communication. I’m pleased to say that they worked with OCLC as they thought about the terms under which the release would be made.” We’ve gotten nice coverage from the New York TimesLibrary Journal, and Boing Boing as well.

Many people have asked what we expect people to do with the data. Personally, I have no idea, and that’s the point. I’ve seen over and over that when data is made openly available with the fewest impediments — legal and technical — people are incredibly creative about finding innovative uses for the data that we never could have predicted. Already, we’re seeing people picking up the data, exploring it, and building on it.

  • The Digital Public Library of America is making the data available through an API that provides data in a much nicer way than the pure MARC record dump that Harvard is making available.
  • Within hours of release, Benjamin Bergstein had already set up his own search interface to the Harvard data using the DPLA API.
  • Carlos Bueno has developed code for the Harvard Library Bibliographic Dataset to parse its “wonky” MARC21 format, and has open-sourced the code.
  • Alf Eaton has documented his own efforts to work with the bibliographic dataset, providing instructions for downloading and extracting the records and putting up all of the code he developed to massage and render the data. He outlines his plans for further extensions as well.

(I’m sure I’ve missed some of the ways people are using the data. Let me know if you’ve heard of others, and I’ll update this list.)

As I’ve said before, “This data serves to link things together in ways that are difficult to predict. The more information you release, the more you see people doing innovative things.” These examples are the first evidence of that potential.

John Palfrey, who was really the instigator of the open metadata project, has been especially interested in getting other institutions to make their own collection metadata publicly available, and the DPLA stands ready to help. They’re running a wiki with instructions on how to add your own institution’s metadata to the DPLA service.

It’s hard to list all the people who make initiatives like this possible, since there are so many, but I’d like to mention a few major participants (in addition to John): Jonathan Hulbert, Tracey Robinson, David Weinberger, and Robin Wendler. Thanks to them and the many others that have helped in various ways.

One Response to “The new Harvard Library open metadata policy”

  1. What does one do with millions of MARC records? | Gavia Libraria Says:

    [...] Shieber, who is not a librarian, disclaims any sense of how the records will be used, though his curiosity on the point is gratifying. Permit the Loon to blue-sky a few potential [...]