Going live with Harvard’s catalog

[Note: Dec. 3, 2013: We’ve updated the links on this page and a bit of the text to reflect the current reality about where things are.]

We’re very pleased not only that Harvard University has decided to make virtually its entire catalog of bibliographic records available for bulk download under a Creative Commons 0 (public domain) license, but that we’re providing programmatic access to those records in their entirety the LibraryCloud API. That’s over 12 million full records in the MARC21 format.

It’s live now. Begin with the API documentation (which includes some legal usage notes) here. If you instead want to do a bulk download, please go here.

We are using a two-tier schema. We have a simplified core which combines and extends Dublin Core and Schema.org. It works across data sets as well as we can manage. But we are preserving all the metadata that doesn’t fit into that core. You can access it if you know the schema. In the case of Harvard’s data, it’s MARC21, so the keys are well-known. You can retrieve entire MARC21 records if that’s where your bliss is, or you can grab the fields you want.

The API is an early alpha. Please let us know about problems you encounter.

We’ve also capped access at 3 queries per second from a single IP address. We are feeling our way here, and we think that that’s probably more than any app is going to need for now, unless it’s trying to absorb all the data through the API, in which case we repeat: Go bulk download it. It’s all there, and we’ll all be much happier.

Thank you, Harvard!

And please note and respect the statement of community norms, including the norm that attribution be given to those who are providing this information, including Harvard and also, importantly, the OCLC. Thank you.

Looking for collections

Now that we’re more public, we’ll be blogging more (I hope and intend).

We’re working on getting our initial build up, and have run into some of the usual sorts of problems getting it mounted on our new VM. It may take a day or two.

In the meantime, we’re continuing to look for collection metadata we can make accessible through the platform’s API. The ideal collection metadata (for our nefarious purposes) would be completely unencumbered, would attach both at the level of the collection and of items, and would point at an interesting variety of media types. If you’ve got some lying around, let us know. In fact, you can be the first person to try our email address: dev@dp.la. And if that doesn’t work, for now use my address: self@evident.com

To those who have contributed already or are in the process: Thanks!