March 12th, 2011
|Hubble’s Dark Matter Map from flickr user NASA Goddard Photo and Video, used by permission|
The Harvard repository, DASH, comprises several thousand articles in all fields of scholarship. These articles are stored and advertised through an item page providing metadata — such as title, author, citation, abstract, and link to the definitive version of record — which typically allows downloading of the article as well. But not all articles are distributed. On some of the item pages, the articles themselves can’t be downloaded; they are “dark”. The decision whether or not to allow for dark articles in a repository comes up sufficiently often that it is worth rehearsing the several reasons to allow it.
- Posterity: Repositories have a role in providing access to scholarly articles of course. But an important part of the purpose of a repository is to collect the research output of the institution as broadly as possible. Consider the mission of a university archives, well described in this Harvard statement: “The Harvard University Archives (HUA) supports the University’s dual mission of education and research by striving to preserve and provide access to Harvard’s historical records; to gather an accurate, authentic, and complete record of the life of the University; and to promote the highest standards of management for Harvard’s current records.” Although the role of the university archives and the repository are different, that part about “gather[ing] an accurate, authentic, and complete record of the life of the University” reflects this role of the repository as well.Since at any given time some of the articles that make up that output will not be distributable, the broadest collection requires some portion of the collection to be dark.
- Change: The rights situation for any given article can change over time — especially over long time scales, librarian time scales — and having materials in the repository dark allows them to be distributed if and when the rights situation allows. An obvious case is articles under a publisher embargo. In that case, the date of the change is known, and repository software can typically handle the distributability change automatically. There are also changes that are more difficult to predict. For instance, if a publisher changes its distribution policies, or releases backfiles as part of a corporate change, this might allow distribution where not previously allowed. Having the materials dark means that the institution can take advantage of such changes in the rights situation without having to hunt down the articles at that (perhaps much) later date.
- Preservation: Dark materials can still be preserved. Preservation of digital objects is by and large an unknown prospect, but one thing we know is that the more venues and methods available for preservation, the more likely the materials will be preserved. Repositories provide yet another venue for preservation of their contents, including the dark part.
- Discoverability: Although the articles themselves can’t be distributed, their contents can be indexed to allow for the items in the repository to be more easily and accurately located. Articles deposited dark can be found based on searches that hit not only the title and abstract but the full text of the article. And it can be technologically possible to pass on this indexing power to other services indexing the repository, such as search engines.
- Messaging: When repositories allow both open and dark materials, the message to faculty and researchers can be made very simple: Always deposit. Everything can go in; the distribution decision can be made separately. If authors have to worry about rights when making the decision whether to deposit in the first place, the cognitive load may well lead them to just not deposit. Since the hardest part about running a successful repository is getting a hold of the articles themselves, anything that lowers that load is a good thing. This point has been made forcefully by Stevan Harnad. It is much easier to get faculty in the habit of depositing everything than in the habit of depositing articles subject to the exigencies of their rights situations.
- Availability: There are times when an author has distribution rights only to unavailable versions of an article. For instance, an author may have rights to distribute the author’s final manuscript, but not the publisher’s version. Or an art historian may not have cleared rights for online distribution of the figures in an article and may not be willing to distribute a redacted version of the article without the figures. The ability to deposit dark enables depositing in these cases too. The publisher’s version or unredacted version can be deposited dark.
- Education: Every time an author deposits an article dark is a learning moment reminding the author that distribution is important and distribution limitations are problematic.
For all these reasons, I believe that it is important to allow for dark items in an article repository. Better dark than missing.
[Hat tip to Sue Kriegsman for discussions on this issue.]