I’m pleased to forward on the announcement that the Harvard Open Access Project has just released an initial version of a guide on “good practices for university open-access policies”. It was put together by Peter Suber and myself with help from many, including Ellen Finnie Duranceau, Ada Emmett, Heather Joseph, Iryna Kuchma, and Alma Swan. It has already received endorsements from the Coalition of Open Access Policy Institutions (COAPI), Confederation of Open Access Repositories (COAR), Electronic Information for Libraries (EIFL), Enabling Open Scholarship (EOS), Harvard Open Access Project (HOAP), Open Access Scholarly Information Sourcebook (OASIS), Scholarly Publishing and Academic Resources Coalition (SPARC), and SPARC Europe.

The official announcement is provided below, replicated from the Berkman Center announcement. Read the rest of this entry »

Karen Spärck Jones, 1935-2007

In honor of Ada Lovelace Day 2012, I write about the only female winner of the Lovelace Medal awarded by the British Computer Society for “individuals who have made an outstanding contribution to the understanding or advancement of Computing”. Karen Spärck Jones was the 2007 winner of the medal, awarded shortly before her death. She also happened to be a leader in my own field of computational linguistics, a past president of the Association for Computational Linguistics. Because we shared a research field, I had the honor of knowing Karen and the pleasure of meeting her on many occasions at ACL meetings.

One of her most notable contributions to the field of information retrieval was the idea of inverse document frequency. Well before search engines were a “thing”, Karen was among the leaders in figuring out how such systems should work. Already in the 1960′s there had arisen the idea of keyword searching within sets of documents, and the notion that the more “hits” a document receives, the higher ranked it should be. Karen noted in her seminal 1972 paper “A statistical interpretation of term specificity and its application in retrieval” that not all hits should be weighted equally. For terms that are broadly distributed throughout the corpus, their occurrence in a particular document is less telling than occurrence of terms that occur in few documents. She proposed weighting each term by its “inverse document frequency” (IDF), which she defined as log(N/(n + 1)) where N is the number of documents and n the number of documents containing the keyword under consideration. When the keyword occurs in all documents, IDF approaches 1 for large N, but as the keyword occurs in fewer and fewer documents (making it a more specific and presumably more important keyword), IDF rises. The two notions of weighting (frequency of occurrence of the keyword together with its specificity as measured by inverse document frequency) are combined multiplicatively in the by now standard tf*idf metric; tf*idf or its successors underlie essentially all information retrieval systems in use today.

In Karen’s interview for the Lovelace Medal, she opined that “Computing is too important to be left to men.” Ada Lovelace would have agreed.

…set the default…

Here’s what’s on deck at Harvard for Open Access Week 2012 (reproduced from the OSC announcement).


From October 22 through October 28, Harvard University is joining hundreds of other institutions of higher learning to celebrate Open Access Week, a global event for the promotion of free, immediate online access to scholarly research.

Harvard will participate in OA Week locally by offering two public events that engage this year’s theme, “Set the default to open access.”

On October 23rd at 12:30 p.m., the Berkman Center for Internet & Society and the Office for Scholarly Communication will host a forum entitled “How to Make Your Research Open Access (Whether You’re at Harvard or Not).” OA advocates Peter Suber and Stuart Shieber will headline the session, answering questions on any aspect of open access and recommending concrete steps for making your work open access. The event will be held at the Berkman Center, 23 Everett Street, 2nd Floor. The Berkman Center will also stream the discussion live online. See the Berkman Center website for more information and to RSVP.

On October 24, a panel of experts will consider efforts by the National Institutes of Health to ensure public access to the published results of federally funded research. “Open Access to Health Research: Future Directions for the NIH Public Access Policy” will feature a discussion of the challenges and opportunities for increasing compliance with the NIH policy. The event, co-sponsored by the Office for Scholarly Communication, Right to Research Coalition, and Universities Allied for Essential Medicines, will be held at the Harvard Law School in Hauser Hall, room 104. More information is available at the Petrie-Flom Center website.