We’re very pleased not only that Harvard University has decided to make virtually its entire catalog of bibliographic records available for bulk download under a Creative Commons 0 (public domain) license, but that it is allowing the DPLA to provide programmatic access to those records in their entirety via the prototype platform’s API. That’s over 12 million full records in the MARC21 format.
We are using a two-tier schema. We have a simplified core which combines and extends Dublin Core and Schema.org. It works across data sets as well as we can manage. But we are preserving all the metadata that doesn’t fit into that core. You can access it if you know the schema. In the case of Harvard’s data, it’s MARC21, so the keys are well-known. You can retrieve entire MARC21 records if that’s where your bliss is, or you can grab the fields you want.
The API is at build level 0.03, so it’s very early alpha. Please let us know about problems you encounter.
We’ve also capped access at 3 queries per second from a single IP address. We are feeling our way here, and we think that that’s probably more than any app is going to need for now, unless it’s trying to absorb all the data through the API, in which case we repeat: Go bulk download it. It’s all there, and we’ll all be much happier.
Thank you, Harvard!
And please note and respect the statement of community norms, including the norm that attribution be given to those who are providing this information, including Harvard and also, importantly, the OCLC. Thank you.