Via Larry Solum’s Legal Theory Blog comes word of an important announcement from the editors of the Northwestern University Law Review. The editors have been paying close attention to the open-access debate (see here for Bill’s terrific compilation of links to many of the most interesting recent posts), and after giving the matter careful thought, are putting themselves squarely on the side of the good guys. From their announcement:
Starting with the fourth issue of our ninety-ninth volume and moving forward, all of our content has been, and will continue to be, available as a PDF download through our past issues tab. As a result, anyone will be able to find Northwestern University Law Review content using an internet search engine, and download it for free. Furthermore, we will maintain a fully permissive policy regarding authors who wish to post drafts of their forthcoming articles to SSRN, Bepress or other locations on the web. That’s the easy part.
The hard part is that we are currently sitting on a mountain of information which is not readily convertible to PDF format — nearly 100 years of scholarship published solely in print in the Law Review. We are committed to making this information freely available as well. However, the technical and financial challenges that accompany scanning the mountain of material that was published before PDFs existed make this a project that will be ongoing, and contingent on donated funding.
This is really wonderful news, particularly the part about bringing the Review‘s older printed material into the modern era of digital permanence. But I still think that Larry Solum’s “three cheers for the Northwestern University Law Review” is too generous by a factor of one cheer.
Issuing current scholarship in PDF format makes a certain amount of logistical sense. What was a relative rarity (although not at all unheard of) when I went to law school is now commonplace: spurred by economic considerations, more and more journals have taken the digital typesetting process in-house. Articles are edited in digital format (chiefly Microsoft Word, it seems) and laid out according to the journal’s own in-house templates, yielding a final PDF copy that can then be transmitted electronically to be printed and bound. Because the journal is going to produce a PDF copy anyway for its own purposes, it takes little extra effort to post the PDF online, and voilà, you’ve gone open-access.
But in circumstances where a journal wouldn’t be producing PDFs for independent in-house reasons (such as when digitizing one’s back catalog), why standardize on PDF? PDF is fine, but it’s still a proprietary format. It’s a proprietary format that makes sense when one needs to preserve the exact layout and formatting of a page to be printed, but why do that with the older issues? Why not simply convert to HTML/XHTML/XML instead?
Posting journal articles online in a truly open format like HTML is a way of bringing those articles, even older articles, into the contemporary debate in a way that PDF can’t hope to match. To take only two examples:
- If an HTML-ized article has been properly tagged, other web sites can hyperlink directly into the portion of the article they’re citing. I can write an article with a link that takes the reader directly into the portion of the 100-page original piece that I’m most interested in. It’s needlessly clumsy to give a reader a link to a PDF-format article and then tell them to manually locate page 478 within that article to find the part you’re citing. This is one of the limitations of the printed page that the Web potentially frees us from, if we simply take advantage of it.
- Cutting and pasting text from a PDF for purposes of quotation is a dicey proposition. It’s altogether impossible if the PDF has been generated as a scanned series of images instead of as OCR’ed text; and even if there is text within the PDF that can be copied-and-pasted, the original’s formatting may come along for the ride. This necessarily entails removing surplus line breaks, deleting surplus hyphens, and generally “fighting” the page formatting of the original piece in circumstances where it’s no longer relevant.
Even where the original page formatting is important for some reason—such as for the purpose of enabling pinpoint cites—that alone isn’t a sufficient reason to use PDF, it seems to me. For example, FindLaw posts Supreme Court opinions in HTML format, with the original page breaks maintained intact through small intralineal notations in a different color. Unobtrusive, but sufficient to permit citation (and hyperlinking!) directly to the pertinent page of a decision.
None of this is meant to detract from the value in what Northwestern is doing, and it would be wonderful to see other journals follow suit. Many of the big-name law reviews have decades’ (and in some cases, more than a century’s) worth of important pieces in their stacks that aren’t accessible online in any form (even in the big-name commercial research databases, which go back only so far). It would be wonderful to get all those resources online in freely accessible format. Those who strive towards the goal of open access, though, would do well to consider that the greatest strides will come from choosing the most open formats available.