Houghton Library, Harvard University
John Tenniel, c. 1864. Study for illustration to Alice’s adventures in wonderland. Harcourt Amory collection of Lewis Carroll, Houghton Library, Harvard University.

We’ve just completed spring semester during which I taught a system design course jointly in Engineering Sciences and Computer Science. The aim of ES96/CS96 is to help the students learn about the process of solving complex, real-world problems — applying engineering and computational design skills — by undertaking an extended, focused effort directed toward an open-ended problem defined by an interested “client”.

The students work independently as a self-directed team. The instructional staff provides coaching, but the students do all of the organization and carrying out of the work, from fact-finding to design to development to presentation of their findings.

This term the problem to be addressed concerned the Harvard Library’s exceptional special collections, vast holdings of rare books, archives, manuscripts, personal documents, and other materials that the library stewards. Harvard’s special collections are unique and invaluable, but are useful only insofar as potential users of the material can find and gain access to them. Despite herculean efforts of an outstanding staff of archivists, the scope of the collections means that large portions are not catalogued, or catalogued in insufficient detail, making materials essentially unavailable for research. And this problem is growing as the cataloging backlog mounts. The students were asked to address core questions about this valuable resource: What accounts for this problem at its core? Can tools from computer science and technology help address the problems? Can they even qualitatively improve the utility of the special collections?

The clients were the leadership of Harvard’s premier Houghton and Schlesinger libraries. The students received briefings from William Stoneman, Florence Fearrington Librarian of Houghton Library, and Marilyn Dunn, Executive Director of the Schlesinger Library and Librarian of the Radcliffe Institute; toured both libraries; and met with a wide range of archivists and librarians, who were incredibly generous with their time and expertise. I’d like to express my deep appreciation and thanks to all of the library staff who helped out with the course. Their participation was vital.

The students’ recommendations centered around the design, development, and prototyping of an “archivist’s workstation” and the unconventional “flipped” collections processing that the workstation enabled. Their process involves exhaustive but lightweight digitization of a collection as a precursor to highly efficient metadata acquisition on top of the digitized images, rather than the conventional approach of  digitizing selectively only after all processing of the collection is performed. The “digitize first” approach means that documents need only be touched once, with all of the sorting, arrangement, and metadata application being performed virtually using optimized user interfaces that they designed for these purposes. The output is a dynamic finding aid with images of all documents, complete with search and faceted browsing of the collection, to supplement the static finding aid of traditional archival processing. The students estimate that processing in this way would be faster than current methods, while delivering a superior result. Their demo video (below) gives a nice overview of the idea.

The deliverables for the course are now available at the course web site, including the final report and a videotape of their final presentation before dozens of Harvard archivists, librarians, and other members of the community.

I hope others find the ideas that the students developed as provocative and exciting as I do. I’m continuing to work with some of them over the summer and perhaps beyond, so comments are greatly appreciated.


I just sent the email below to my friends and family. Feel free to send a similar letter to yours.

You know me. I don’t send around chain letters, much less start them. So you know that if I’m sending you an email and asking you to tell your friends, it must be important.

This is important.

As taxpayers, we deserve access to the research that we fund. It’s in everyone’s interest: citizens, researchers, government, everyone. I’ve been working on this issue for years. I recently testified before a House committee about it.

Now we have an opportunity to tell the White House that they need to take action. There is a petition at the White House petition site calling for “President Obama to act now to implement open access policies for all federal agencies that fund scientific research.” If we get 25,000 signatures by June 19, 2012, the petition will be placed in the Executive Office of the President for a policy response.

Please sign the petition. I did. I was signatory number 442. Only 24,558 more to go.

Signing the petition is easy. You register at the White House web site verifying your email address, and then click a button. It’ll take five minutes tops. (If you’re already registered, you’re down to ten seconds.)

Please sign the petition, and then tell those of your friends and family who might be interested to do so as well. You can inform people by tweeting them this URL <http://bit.ly/MAbTHG> or posting on your Facebook page or sending them an email or forwarding them this one. If you want, you can point them to a copy of this email that I’ve put up on the web at <http://bit.ly/J8EmyD>.

Since I’ve just requested that you send other people this email (and that they do so as well), I want to make sure that there’s a chain letter disclaimer here: Do not merely spam every email address you can find. Please forward only to those people who you know well enough that it will be appreciated. Do not forward this email after June 19, 2012. The petition drive will be over by then. By all means before forwarding the email check the White House web link showing the petition at whitehouse.gov to verify that this isn’t a hoax. Feel free to modify this letter when you forward it, but please don’t drop the substance of this disclaimer paragraph.

You can find out more about the petition from the wonderful people at Access2Research who initiated it, and you can read more about my own views on open access to the scholarly literature at my blog, the Occasional Pamphlet.

Thank you for your help.

Stuart M. Shieber
Welch Professor of Computer Science
Director, Office for Scholarly Communication
Harvard University