Front steps of National Library of Medicine, 2008, photo courtesy of NIH Image Bank
Front steps of National Library of Medicine, 2008, photo courtesy of NIH Image Bank

Imagine my surprise when I actually received a response to my letters in recognition of the NIH public access policy, a form letter undoubtedly, but nonetheless gratefully received. And as a side effect, it allows us to gauge the understanding of the issues in the pertinent offices.

The letter, which I’ve duplicated below in its entirety, addresses two of the issues that I raised in my letter, the expansion of the policy to other agencies and the desirability for a reduction in the embargo period.

With regard to expanding the NIH policy to other funding agencies, the response merely notes the America COMPETES Act‘s charge to establish a working group to study the matter — fine as far as it goes, but not an indication of support for expansion itself.

With regard to the embargo issue, the response seems a bit confused as to how things work in the real world. Let’s look at some sentences from the pertinent paragraph:

  • “As you may know, the 12-month delay period specified by law (Division G, Title II, Section 218 of P.L. 110-161) is an upper limit. Rights holders (sometimes the author, and sometimes they transfer some or all of these rights to publishers) are free to select a shorter delay period, and many do.” This is of course true. My hope, and that of many others, is to decrease this maximum.
  • “The length of the delay period is determined through negotiation between authors and publishers as part of the copyright transfer process.” Well, not so much. Authors don’t so much negotiate with publishers as just sign whatever publishers put in their path. When one actually attempts to engage in negotiation, sadly rare among academic authors, things often go smoothly, but sometimes take a turn for the odd, and authors in the thrall of publish or perish are short on negotiating leverage.
  • “These negotiations can be challenging for authors, and our guidance (http://publicaccess.nih.gov/FAQ.htm#778) encourages authors to consult with their institutions when they have questions about copyright transfer agreements.” I have a feeling that the word challenging is a euphemism for something else, but I’m not sure what. The cited FAQ doesn’t in fact provide guidance on negotiation, but just language to incorporate into a publisher agreement to make it consistent with the 12-month embargo. No advice on what to do if the publisher refuses, much less how to negotiate shorter embargoes. As for the excellent advice to “consult with their institutions”, in the case of Harvard, that kind of means to talk with my office, doesn’t it? Which, I suppose, is a vote of confidence.

So there is some room for improvement in understanding the dynamic at play in author-publisher relations, but overall, I’m gratified that NIH folks are on top of this issue and making a good faith effort to bring the fruits of research to the scholarly community and the public at large, and reiterate my strong support of NIH’s policy.

Here’s the full text of the letter:

DEPARTMENT OF HEALTH & HUMAN SERVICES
Public Health Service
National Institutes of Health
Bethesda, Maryland 20892

May 27 2011

Stuart M. Shieber, Ph.D.
Welch Professor of Computer Science, and
Director, Office for Scholarly Communication
1341 Massachusetts Avenue
Cambridge, Massachusetts 02138

Dear Dr. Shieber:

Thank you for your letters to Secretary Sebelius and Dr. Collins regarding the NIH Public Access Policy. I am the program manager for the Policy, and have been asked to respond to you directly.

We view the policy as an important tool for ensuring that as many Americans as possible benefit from the public’s investment in research through NIH.

I appreciate your suggestions about reducing the delay period between publication and availability of a paper on PubMed Central. As you may know, the 12-month delay period specified by law (Division G, Title II, Section 218 of P.L. 110-161) is an upper limit. Rights holders (sometimes the author, and sometimes they transfer some or all of these rights to publishers) are free to select a shorter delay period, and many do. The length of the delay period is determined through negotiation between authors and publishers as part of the copyright transfer process. These negotiations can be challenging for authors, and our guidance (http://publicaccess.nih.gov/FAQ.htm#778) encourages authors to consult with their institutions when they have questions about copyright transfer agreements.

I also appreciate your suggestion to expand this Policy to other Federal science funders, and the confidence it implies in our approach. The National Science and Technology Council (NSTC) has been charged by the America COMPETES Reauthorization Act of 2010 (P.L. 111-358) to establish a working group to explore the dissemination and stewardship of peer reviewed papers arising from Federal research funding. I am copying Dr. Celeste Rohlfing at the Office of Science and Technology Policy on this correspondence, as she is coordinating the NSTC efforts on Public Access.

Sincerely,

Neil M. Thakur, Ph.D.
Special Assistant to the NIH Deputy Director for Extramural Research

cc: Ms. Celeste M. Rohlfing
Assistant Director for Physical Sciences
Office of Science and Technology Policy
Executive Office of the President
725 17th Street, Room 5228
Washington, DC 20502

dictionary and red pencil
Dictionary and red pencil, photo by novii, on Flickr

Sanford Thatcher has written a valuable, if anecdotal, analysis of some papers residing on Harvard’s DASH repository (Copyediting’s Role in an Open-Access World, Against the Grain, volume 23, number 2, April 2011, pages 30-34), in an effort to get at the differences between author manuscripts and the corresponding published versions that have benefited from copyediting.

“What may we conclude from this analysis?” he asks. “By and large, the copyediting did not result in any major improvements of the manuscripts as they appear at the DASH site.” He finds that “the vast majority of changes made were for the sake of enforcing a house formatting style and cleaning up a variety of inconsistencies and infelicities, none of which reached into the substance of the writing or affected the meaning other than by adding a bit more clarity here and there” and expects therefore that the DASH versions are “good enough” for many scholarly and educational uses.

Although more substantive errors did occur in the articles he examined, especially in the area of citation and quotation accuracy, they were typically carried over to the published versions as well. He notes that “These are just the kinds of errors that are seldom caught by copyeditors.”

One issue that goes unmentioned in the column is the occasional introduction of errors by the typesetting and copyediting process itself. This used to happen with great frequency in the bad old days when publishers rekeyed papers to typeset them. It was especially problematic in fields like my own, in which papers tend to have large amounts of mathematical notation, which the typesetting staff had little clue about the niceties of. These days more and more journals allow authors to submit LaTeX source for their articles, which the publisher merely applies the house style file to. This practice has been a tremendous boon to the accuracy and typesetting quality of mathematical articles. Still, copyediting can introduce substantive errors in the process. Here’s a nice example from a paper in the Communications of the ACM:

“Besides getting more data, faster, we also now use much more sophisticated learning algorithms. For instance, algorithms based on logistic regression and that support vector machines can reduce by half the amount of spam that evades filtering, compared to Naive Bayes.” (Joshua Goodman, Gordon V. Cormack, and David Heckerman, Spam and the ongoing battle for the inbox, Communications of the Association for Computing Machinery, volume 50, number 2, 2007, page 27.  Emphasis added.)

Any computer scientist would immediately see that the sentence as published makes no sense. There is no such thing as a “vector machine” and in any case algorithms don’t support them. My guess is that the author manuscript had the sentence “For instance, algorithms based on logistic regression and support vector machines can reduce by half…” — without the word that. The copyeditor apparently didn’t realize that the noun phrase support vector machine is a term of art in the machine learning literature; the word support was not intended to be a verb here. (Do a Google search for vector machine. Every hit has the phrase in the context of the term support vector machine, at least for the pages I looked at before boredom set in.)

Presumably, the authors didn’t catch the error introduced by the copyeditor. The occurrence of errors of this sort is no argument against copyediting, but it does demonstrate that it should be viewed as a collaborative activity between copyeditors and authors, and better tools for collaboratively vetting changes would surely be helpful.

In any case, back to Dr. Thatcher’s DASH study. Ellen Duranceau at MIT Libraries News views the study as “support for the MIT faculty’s approach to sharing their articles through their Open Access Policy”, and the same could be said for Harvard as well. However, before we declare victory, it’s worth noting that Dr. Thatcher did find differences between the versions, and in general the edits were beneficial.

The title of Dr. Thatcher’s column gets at the subtext of his conclusions, that in an open-access world, we’d have to live with whatever errors copyediting would have caught, since we’d be reading uncopyedited manuscripts. But open-access journals can and do provide copyediting as one of their services, and to the extent that doing so improves the quality of the articles they publish and thus the imprimatur of the journal, it has a secondary benefit to the journal of improving its brand and its attractiveness to authors.

I admit that I’m a bit of a grammar nerd (with what I think is a nuanced view that manages to be linguistically descriptivist and editorially prescriptivist at the same time) and so I think that copyediting can have substantial value. (My own writing was probably most improved by Savel Kliachko, an outstanding editor at my first employer SRI International.) To my mind, the question is how to provide editing services in a rational way. Given that the costs of copyediting are independent of the number of accesses, and that the value accrues in large part to the author (by making him or her look like less of a halfwit for exhibiting “inconsistencies and infelicities” and occasionally more substantive errors), it seems reasonable that authors ought to pay publishers a fee for these services. And that is exactly what happens in open-access journals. Authors can decide if the bargain is a good one on the basis of the services that the publisher provides, including copyediting, relative to the fee the publisher charges. As a result, publishers are given incentive to provide the best services for the dollar. A good deal all around.

Most importantly, in a world of open-access journals the issue of divergence between author manuscripts and publisher versions disappears, since readers are no longer denied access to the definitive published version. Dr. Thatcher concludes that the benefits of copyediting were not as large as he would have thought. Nonetheless, however limited the benefits might be, properly viewed those benefits argue for open access.