...wave of the future... /
…wave of the future…
Nonantum Wave” photo by flickr user mjsawyer. Used by permission (CC-BY-NC-ND 2.0).

I get the sense that we’ve moved into a new phase in discussions of open access. There seems to be a consensus that open access is an inevitability. We’re hearing this not only from the usual suspects in academia but from publishers, policy-makers, and other interested parties. I’ve started collecting pertinent quotes. The voices remarking on the inevitability of open access range from congressional representatives sponsoring the pro-OA FRPAA bill (Representative Lofgren) to the sponsors of the anti-OA RWA bill (Representatives Issa and Maloney), from open-access publishers (Sutton of Co-Action) to the oldest of guard subscription publishers (Campbell of Nature). Herewith, a selection. Pointers to other examples would be greatly appreciated.

“I agree which is why I am a cosponsor of the bill [FRPAAHR4004], but I think even if the bill does not pass, this [subscription journal] model is dead. It is just a question of how long the patient is going to be on life support.”

— Zoe Lofgren (D-CA), March 29, 2012

“As the costs of publishing continue to be driven down by new technology, we will continue to see a growth in open access publishers. This new and innovative model appears to be the wave of the future.”

— Darryl Issa (R-CA) and Carolyn Maloney (D-NY), co-sponsors of H.R. 3699 (“The Research Works Act”), February 27, 2012

“I realise this move to open access presents a challenge and opportunity for your industry, as you have historically received funding by charging for access to a publication. Nevertheless that funding model is surely going to have to change even beyond the welcome transition to open access and hybrid journals that’s already underway. To try to preserve the old model is the wrong battle to fight.”

David Willetts (MP, Minister of State for Universities and Science), May 2, 2012

“[A] change in the delivery of scientific content and in the business models for delivering scholarly communication was inevitable from the moment journals moved online, even if much of this change is yet to come.”

— Caroline Sutton (Publisher, Co-Action Publishing), December 2011

“My personal belief is that that’s what’s going to happen in the long run.”

Philip Campbell (Editor-in-chief, Nature), June 8, 2012

“In the longer term, the future lies with open access publishing,” said Finch at the launch of her report on Monday. “The UK should recognise this change, should embrace it and should find ways of managing it in a measured way.”

Janet Finch (Professor of Sociology, University of Manchester; Chair, Working Group on Expanding Access to Published Research Findings), June 18, 2012

“Open access is here to stay, and has the support of our key partners.”

Association of Learned and Professional Society Publishers, July 25, 2011

 (Hat tip to Peter Suber for pointers to a couple of these quotes.)

Talmud and the Turing Test

June 16th, 2012

Image of the statue of the Golem of Prague at the entrance to the Jewish Quarter of Prague by flickr user D_P_R. Used by permission.
…the Golem…
Image of the statue of the Golem of Prague at the entrance to the Jewish Quarter of Prague by flickr user D_P_R. Used by permission (CC-BY 2.0).

Alan Turing, the patron saint of computer science, was born 100 years ago this week (June 23). I’ll be attending the Turing Centenary Conference at University of Cambridge this week, and am honored to be giving an invited talk on “The Utility of the Turing Test”. The Turing Test was Alan Turing’s proposal for an appropriate criterion to attribute intelligence (that is, capacity for thinking) to a machine: you verify through blinded interactions that the machine has verbal behavior indistinguishable from a person.

In preparation for the talk, I’ve been looking at the early history of the premise behind the Turing Test, that language plays a special role in distinguishing thinking from nonthinking beings. I had thought it was an Enlightenment idea, that until the technological advances of the 16th and 17th centuries, especially clockwork mechanisms, the whole question of thinking machines would never have entertained substantive discussion. As I wrote earlier,

Clockwork automata provided a foundation on which one could imagine a living machine, perhaps even a thinking one. In the midst of the seventeenth-century explosion in mechanical engineering, the issue of the mechanical nature of life and thought is found in the philosophy of Descartes; the existence of sophisticated automata made credible Descartes’s doctrine of the (beast-machine), that animals were machines. His argument for the doctrine incorporated the first indistinguishability test between human and machine, the first Turing test, so to speak.

But I’ve seen occasional claims here and there that there is in fact a Talmudic basis to the Turing Test. Could this be true? Was the Turing Test presaged, not by centuries, but by millennia?

Uniformly, the evidence for Talmudic discussion of the Turing Test is a single quote from Sanhedrin 65b.

Rava said: If the righteous wished, they could create a world, for it is written, “Your iniquities have been a barrier between you and your God.” For Rava created a man and sent him to R. Zeira. The Rabbi spoke to him but he did not answer. Then he said: “You are [coming] from the pietists: Return to your dust.”

Rava creates a Golem, an artificial man, but Rabbi Zeira recognizes it as nonhuman by its lack of language and returns it to the dust from which it was created.

This story certainly describes the use of language to unmask an artificial human. But is it a Turing Test precursor?

It depends on what one thinks are the defining aspects of the Turing Test. I take the central point of the Turing Test to be a criterion for attributing intelligence. The title of Turing’s seminal Mind article is “Computing Machinery and Intelligence”, wherein he addresses the question “Can machines think?”. Crucially, the question is whether the “test” being administered by Rabbi Zeira is testing the Golem for thinking, or for something else.

There is no question that verbal behavior can be used to test for many things that are irrelevant to the issues of the Turing Test. We can go much earlier than the Mishnah to find examples. In Judges 12:5–6 (King James Version)

5 And the Gileadites took the passages of Jordan before the Ephraimites: and it was so, that when those Ephraimites which were escaped said, Let me go over; that the men of Gilead said unto him, Art thou an Ephraimite? If he said, Nay;

6 Then said they unto him, Say now Shibboleth: and he said Sibboleth: for he could not frame to pronounce it right. Then they took him, and slew him at the passages of Jordan: and there fell at that time of the Ephraimites forty and two thousand.

The Gileadites use verbal indistinguishability (of the pronounciation of the original shibboleth) to unmask the Ephraimites. But they aren’t executing a Turing Test. They aren’t testing for thinking but rather for membership in a warring group.

What is Rabbi Zeira testing for? I’m no Talmudic scholar, so I defer to the experts. My understanding is that the Golem’s lack of language indicated not its own deficiency per se, but the deficiency of its creators. The Golem is imperfect in not using language, a sure sign that it was created by pietistic kabbalists who themselves are without sufficient purity.

Talmudic scholars note that the deficiency the Golem exhibits is intrinsically tied to the method by which the Golem is created: language. The kabbalistic incantations that ostensibly vivify the Golem were generated by mathematical combinations of the letters of the Hebrew alphabet. Contemporaneous understanding of the Golem’s lack of speech was connected to this completely formal method of kabbalistic letter magic: “The silent Golem is, prima facie, a foil to the recitations involved in the process of his creation.” (Idel, 1990, pages 264–5) The imperfection demonstrated by the Golem’s lack of language is not its inability to think, but its inability to wield the powers of language manifest in Torah, in prayer, in the creative power of the kabbalist incantations that gave rise to the Golem itself.

Only much later does interpretation start connecting language use in the Golem to soul, that is, to an internal flaw: “However, in the medieval period, the absence of speech is related to what was conceived then to be the highest human faculty: reason according to some writers, or the highest spirit, Neshamah, according to others.” (Idel, 1990, page 266, emphasis added)

By the 17th century, the time was ripe for consideration of whether nonhumans had a rational soul, and how one could tell. Descartes’s observations on the special role of language then serve as the true precursor to the Turing Test. Unlike the sole Talmudic reference, Descartes discusses the connection between language and thinking in detail and in several places — the Discourse on the Method, the Letter to the Marquess of Newcastle — and his followers — Cordemoy, La Mettrie — pick up on it as well. By Turing’s time, it is a natural notion, and one that Turing operationalizes for the first time in his Test.

The test of the Golem in the Sanhedrin story differs from the Turing Test in several ways. There is no discussion that the quality of language use was important (merely its existence), no mention of indistinguishability of language use (but Descartes didn’t either), and certainly no consideration of Turing’s idea of blinded controls. But the real point is that at heart the Golem test was not originally a test for the intelligence of the Golem at all, but of the purity of its creators.

References

Idel, Moshe. 1990. Golem: Jewish magical and mystical traditions on the artificial anthropoid, Albany, N.Y.: State University of New York Press.

Pity the poor, beleaguered “Impact Factor™” (IF), a secret mathematistic formula originally intended to serve as a proxy for journal quality. No one seems to like it much. The manifold problems with IF have been rehearsed to death:

  • The calculation isn’t a proper average.
  • The calculation is statistically inappropriate.
  • The calculation ignores most of the citation data.
  • The calculated values aren’t reproducible.
  • Citation rates, and hence Impact Factors, vary considerably across fields making cross-discipline comparison meaningless.
  • Citation rates vary across languages. Ditto.
  • IF varies over time, and varies differentially for different types of journals.
  • IF is manipulable by publishers.

The study by the International Mathematical Union is especially trenchant on these matters, as is Bjorn Brembs’ take. I don’t want to pile on, just look at some new data that shows that IF has been getting even more problematic over time.

One of the most egregious uses of IF is in promotion and tenure discussions. It’s been understood for a long time that the Impact Factor, given its manifest problems as a method for ranking journals, is completely inappropriate for ranking articles. As the European Association of Science Editors has said

Therefore the European Association of Science Editors recommends that journal impact factors are used only – and cautiously – for measuring and comparing the influence of entire journals, but not for the assessment of single papers, and certainly not for the assessment of researchers or research programmes either directly or as a surrogate.

Even Thomson Reuters says

The impact factor should be used with informed peer review. In the case of academic evaluation for tenure it is sometimes inappropriate to use the impact of the source journal to estimate the expected frequency of a recently published article.

“Sometimes inappropriate.” Snort.

Graph from Lozano et al., 2012
…the money chart…

Check out the money chart from the recent paper “The weakening relationship between the Impact Factor and papers’ citations in the digital age” by George A. Lozano, Vincent Lariviere, and Yves Gingras.

They address the issue of whether the most highly cited papers tend to appear in the highest Impact Factor journals, and how that has changed over time. One of their analyses looked at the papers that fall in the top 5% for number of citations over a two-year period following publication, and depicts what percentage of these do not appear in the top 5% of journals as ranked by Impact Factor. If Impact Factor were a perfect reflection of the future citation rate of the articles in the journal, this number should be zero.

As it turns out, the percentage has been extremely high over the years. The majority of top papers fall into this group, indicating that restricting attention to top Impact Factor journals doesn’t nearly cover the best papers. This by itself is not too surprising, though it doesn’t bode well for IF.

More interesting is the trajectory of the numbers. At one point, roughly up through World War II, the numbers were in the 70s and 80s. Three quarters of the top-cited papers were not in the top IF journals. After the war, a steady consolidation of journal brands, along with the invention of the formal Impact Factor in the 60s and its increased use, led to a steady decline in the percentage of top articles in non-top journals. Basically, a journal’s imprimatur — and its IF along with it — became a better and better indicator of the quality of the articles it published. (Better, but still not particularly good.)

This process ended around 1990. As electronic distribution of individual articles took over for distribution of articles bundled within printed journal issues, it became less important which journal an article appeared in. Articles more and more lived and died by their own inherent quality rather than by the quality signal inherited from their publishing journal. The pattern in the graph is striking.

The important ramification is that the Impact Factor of a journal is an increasingly poor metric of quality, especially at the top end. And it is likely to get even worse. Electronic distribution of individual articles is only increasing, and as the Impact Factor signal decreases, there is less motivation to publish the best work in high IF journals, compounding the change.

Meanwhile, computer and network technology has brought us to the point where we can develop and use metrics that serve as proxies for quality at the individual article level. We don’t need to rely on journal-level metrics to evaluate articles.

Given all this, promotion and tenure committees should proscribe consideration of journal-level metrics — including Impact Factor — in their deliberations. Instead, if they must use metrics, they should use article-level metrics only, or better yet, read the articles themselves.

 

Thumbs up.
…good behavior…

Harvard made a big splash recently when my colleagues on the Faculty Advisory Council to the Harvard Library distributed a Memorandum on Journal Pricing. One of the main problems with the memo, however, is the relatively imprecise recommendations that it makes. It exhorts faculty to work with journals and scholarly societies on their publishing and pricing practices, but provides no concrete advice on exactly what to request. What is good behavior?

I just met with a colleague here at Harvard who raised the issue with me directly. He’s a member of the editorial board of a journal and wants to do the right thing and make sure that the journal’s policies make them a good actor. But he doesn’t want to (and shouldn’t be expected to) learn all the ins and outs of the scholarly publishing business, the legalisms of publication agreements and copyright, and the interactions of all that with the policies of the particular journal. He’s not alone; there are many others in the same boat. Maybe you are too. Is there some pithy request you can make of the journal that encapsulates good publishing practices?

(I’m assuming here that converting the journal to open access is off the table. Of course, that would be preferable, but it’s unlikely to get you very far, especially given that the most plausible revenue model for open-access journal publishing, namely, publication fees, is not well supported by the scholarly publishing ecology as of yet.)

There are two kinds of practices that the Harvard memo moots: it explicitly mentions pricing practices of journals, and implicitly brings up author rights issues in its recommendations. Scholar participants in journals (editors editorial board members, reviewers) may want to discuss both kinds of practices with their publishers. I have recommendations for both.

Rights practices

Here’s my candidate recommendation for ensuring a subscription journal has good rights practice. You (and, ideally, your fellow editorial board members) hand the publisher a copy of the Science Commons Delayed Access (SCDA) addendum. (Here’s a sample.) You request that they adjust their standard article publication agreement so as to make the addendum redundant. This request has several nice effects.

  1. It’s simple, concrete, and unambiguous.
  2. It describes the desired result in terms of functionality — what the publication agreement should achieve — not how it should be worded.
  3. It guarantees that the journal exhibits best practices for a subscription journal. Any journal that can satisfy the criterion that the SCDA addendum is redundant:
    • Let’s authors retain rights to use and reuse the article in further research and scholarly activities,
    • Allows green OA self-archiving without embargo,
    • Allows compliance with any funder policies (such as the NIH Public Access Policy),
    • Allows compliance with employer policies (such as university open-access policies) without having to get a waiver, and
    • Allows distribution of the publisher’s version of the article after a short embargo period.
  4. It applies to journals of all types. (Just because the addendum comes from Science Commons doesn’t mean it’s not appropriate for non-science journals.)
  5. It doesn’t require the journal to give up exclusivity to its published content (though it makes that content available with a moving six-month wall).

The most controversial aspect of an SCDA-compliant agreement from the publisher’s point of view is likely the ability to distribute the publisher’s version of the article after a six-month embargo. I wouldn’t be wed to that six month figure. This provision would be the first thing to negotiate, by increasing the embargo length — to one year, two years, even five years. But sticking to some finite embargo period for distributing the publisher’s version is a good idea, if only to serve as precedent for the idea. Once the journal is committed to allowing distribution of the publisher’s version after some period, the particular embargo length might be reduced over time.

Pricing practices

The previous proposal does a good job, to my mind, of encapsulating a criterion of best publication agreement practice, but it doesn’t address the important issue of pricing practice. Indeed, with respect to pricing practices, it’s tricky to define good value. Looking at the brute price of a journal is useless, since journals publish wildly different numbers of articles, from the single digits to the four digits per year, so three orders of magnitude variations in price per journal is expected. Price per article and price per page are more plausible metrics of value, but even there, because journals differ in the quality of articles they tend to publish, hence their likely utility to readers, variation in these metrics should be expected as well. For this reason, some analyses of value look to citation rate as a proxy for quality, leading to a calculation of price per citation.

Another problem is getting appropriate measures of the numerator in these metrics. When calculating price per article or per page or per citation, what price should one use? Institutional list price is a good start. List price for individual subscriptions is more or less irrelevant, given that individual subscriptions account for a small fraction of revenue. But publishers, especially major commercial publishers with large journal portfolios, practice bundling and price discrimination that make it hard to get a sense of the actual price that libraries pay. On the other hand, list price is certainly an upper bound on the actual price, so not an unreasonable proxy.

Finally, any of these metrics may vary systematically across research fields, so the metrics ought to be relativized within a field.

Ted and Carl Bergstrom have collected just this kind of data for a large range of journals at their journalprices.com site, calculating price per article and price per citation along with a composite index calculated as the geometric mean of the two. To handle the problem of field differences, they provide a relative price index (RPI) that compares the composite to the median for non-profit journals within the field, and propose that a journal be considered “good value” if RPI is less than 1.25, “medium value” if its RPI is less than 2, and “bad value” otherwise.

As a good first cut at a simple message to a journal publisher then, you could request that the price of a journal be reduced to bring its RPI below 1.25 (that is, good value), or at least 2 (medium value). Since lots of journals run in the black with composite price indexes below median, that is, with RPI below 1, achieving an RPI of 2 should be achievable for an efficient publisher. (My colleague’s journal, the one that precipitated this post, has an RPI of 2.85. Plenty of room for improvement.)

In summary, you ask the journal to change its publication agreement to be SCDA-compliant and its price to have RPI less than 2. That’s specific, pragmatic, and actionable. If the journal won’t comply, you at least know where they stand. If you don’t like the answers you’re getting, you can work to find a new publisher willing to play ball, or at least, don’t lend your free labor to the current one.