Today, Harvard announced that it would make 12 million catalog records—nearly all of the records from its 73 libraries—publicly available. The records include bibliographic information about items from a diverse set of media, and The Harvard Library has made all this metadata available under a Creative Commons 0 (CC0) public domain license.
“With this major contribution, developers will be able to start experimenting with building innovative applications that put to use the vital national resource that consists of our local public and research libraries, museums, archives and cultural collections,” said DPLA Steering Committee Chair John Palfrey in the official Harvard Library press release. He also stated that he hoped the Harvard release would set a precedent for other institutions’ collections.
The records are available for bulk download in MARC21 format. The DPLA Tech Dev team has incorporated the records into its database and is making them available through an API. API documentation from the team is available here. The API is in early alpha, so any and all feedback is welcome as the team continues to refine. The team celebrated the metadata release in their post, “Going live with Harvard’s catalog.”
Amazing work in the digital humanities could be done with the metadata. The New York Times’ Bits Blog discussed the release with David Weinberger: “‘This is Big Data for books,’ said David Weinberger, co-director of Harvard’s Library Lab. ‘There might be 100 different attributes for a single object.’ At a one-day test run with 15 hackers working with information on 600,000 items, he said, people created things like visual timelines of when ideas became broadly published, maps showing locations of different items, and a ‘virtual stack’ of related volumes garnered from various locations.”
Though metadata is distinct from the content it describes, this sort of data is essential to the growing DPLA and is a vital step towards the project’s realization.