You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Moderating principles

July 25th, 2022

Some time around April 1994, I founded the Computation and Language E-Print Archive, the first preprint repository for a subfield of computer science. It was hosted on Paul Ginsparg’s arXiv platform, which at the time had been hosting only physics papers, built out from the original arXiv repository for high-energy physics theory, hep-th. The repository, cmp-lg (as it was then called), was superseded in 1999 by an open-access preprint repository for all of computer science, the Computing Research Repository (CoRR), which covered a broad range of subject areas, including computation and language. The CoRR organizing committee also decided to host CoRR on arXiv. I switched over to moderating for the CoRR repository from cmp-lg, and have continued to do so for the last – oh my god – 22 years.[1]

Articles in the arXiv are classified with a single primary subject class, and may have other subject classes as secondary. The switchover folded cmp-lg into the arXiv as articles tagged with the cs.CL (computation and language) subject class. I thus became the moderator for cs.CL.

A preprint repository like the arXiv is not a journal. There is no peer review applied to articles. There is essentially no quality control. That is not the role of a preprint repository. The role of a preprint repository is open distribution, not vetting. Nonetheless, some kind of control is needed in making sure that, at the very least, the documents being submitted are in fact scholarly articles and are appropriately tagged as to subfield, and that need has expanded with the dramatic increase in submissions to CoRR over the years. The primary duty of a moderator is to perform this vetting and triage: verifying that a submission possesses the minimum standards for being characterized as a scholarly article, and that it falls within the purview of, say, cs.CL, as a primary or secondary subject class.

I am (along with the other arXiv moderators) thus regularly in the position of having to make decisions as to whether a document is a scholarly article or not. To a large extent, Justice Potter Stewart’s approach works reasonably well for scholarly articles: you know them when you see them. But over time, as more marginal cases come up, I’ve felt that tracking my thinking on the matter would be useful for maintaining consistency in my own practice. And now that I’ve done that for a while, I thought it might be useful to share my approach more broadly. That is the goal of this post.

The following thus constitutes (some of) the de facto policies that I use in making decisions as the moderator for the cs.CL collection in the CoRR part of the arXiv repository. I emphasize that these are my policies, not those of CoRR or the moderators of other CoRR subjects. (The arXiv folks themselves provide a more general guide for arXiv moderators.) Read the rest of this entry »

Whence function notation?

September 28th, 2015


I begin — in continental style, unmotivated and, frankly, gratuitously — by defining Ackerman’s function \(A\) over two integers:

\[ A(m, n) = \left\{ \begin{array}{l}
n + 1 & \mbox{ if $m=0$ } \\
A(m-1, 1) & \mbox{ if $m > 0$ and $n = 0$ } \\
A(m-1, A(m, n-1)) & \mbox{ if $m > 0$ and $n > 0$ }
\end{array} \right. \]

… drawing their equations evanescently in dust and sand…
…drawing their equations evanescently in dust and sand…
Image of “Death of Archimedes” from Charles F. Horne, editor, Great Men and Famous Women, Volume 3, 1894. Reproduced by Project Gutenberg. Used by permission.

You’ll have appreciated (unconsciously no doubt) that this definition makes repeated use of a notation in which a symbol precedes a parenthesized list of expressions, as for example \(f(a, b, c)\). This configuration represents the application of a function to its arguments. But you knew that. And why? Because everyone who has ever gotten through eighth grade math has been taught this notation. It is inescapable in high school algebra textbooks. It is a standard notation in the most widely used programming languages. It is the very archetype of common mathematical knowledge. It is, for God’s sake, in the Common Core. It is to mathematicians as water is to fish — so encompassing as to be invisible.

Something so widespread, so familiar — it’s hard to imagine how it could be otherwise. It’s difficult to un-see it as anything but function application. But it was not always thus. Someone must have invented this notation, some time in the deep past. Perhaps it came into being when mathematicians were still drawing their equations evanescently in dust and sand. Perhaps all record has been lost of that ur-application that engendered all later function application expressions. Nonetheless, someone must have come up with the idea.

… that ur-application…
…that ur-application…
Photo from the author.

Surprisingly, the origins of the notation are not shrouded in mystery. The careful and exhaustive scholarship of mathematical historian Florian Cajori (1929, page 267) argues for a particular instance as originating the use of this now ubiquitous notation. Leonhard Euler, the legendary mathematician and perhaps the greatest innovator in successful mathematical notations, proposed the notation first in 1734, in Section 7 of his paper “Additamentum ad Dissertationem de Infinitis Curvis Eiusdem Generis” [“An Addition to the Dissertation Concerning an Infinite Number of Curves of the Same Kind”].

The paper was published in 1740 in Commentarii Academiae Scientarium Imperialis Petropolitanae [Memoirs of the Imperial Academy of Sciences in St. Petersburg], Volume VII, covering the years 1734-35. A visit to the Widener Library stacks produced a copy of the volume, letterpress printed on crisp rag paper, from which I took the image shown above of the notational innovation.

Here is the pertinent sentence (with translation by Ian Bruce.):

Quocirca, si \(f\left(\frac{x}{a} +c\right)\) denotet functionem quamcunque ipsius \(\frac{x}{a} +c\) fiet quoque \(dx − \frac{x\, da}{a}\) integrabile, si multiplicetur per \(\frac{1}{a} f\left(\frac{x}{a} + c\right)\).
[On account of which, if \(f\left(\frac{x}{a} +c\right)\) denotes some function of \(\frac{x}{a} +c\), it also makes \(dx − \frac{x\, da}{a}\) integrable, if it is multiplied by \(\frac{1}{a} f\left(\frac{x}{a} + c\right)\).]

There is the function symbol — the archetypal \(f\), even then, to evoke the concept of function — followed by its argument corralled within simple curves to make clear its extent.

It’s seductive to think that there is an inevitability to the notation, but this is an illusion, following from habit. There are alternatives. Leibniz for instance used a boxy square-root-like diacritic over the arguments, with numbers to pick out the function being applied: \( \overline{a; b; c\,} \! | \! \lower .25ex {\underline{\,{}^1\,}} \! | \), and even Euler, in other later work, experimented with interposing a colon between the function and its arguments: \(f : (a, b, c)\). In the computing world, “reverse Polish” notation, found on HP calculators and the programming languages Forth and Postscript, has the function symbol following its arguments: \(a\,b\,c\,f\), whereas the quintessential functional programming language Lisp parenthesizes the function and its arguments: \((f\ a\ b\ c)\).

Finally, ML and its dialects follow Church’s lambda calculus in merely concatenating the function and its (single) argument — \(f \, a\) — using parentheses only to disambiguate structure. But even here, Euler’s notation stands its ground, for the single argument of a function might itself have components, a ‘tuple’ of items \(a\), \(b\), and \(c\) perhaps. The tuples might be indicated using an infix comma operator, thus \(a,b,c\). Application of a function to a single tuple argument can then mimic functions of multiple arguments, for instance, \(f (a, b, c)\) — the parentheses required by the low precedence of the tuple forming operator — and we are back once again to Euler’s notation. Clever, no? Do you see the lengths to which people will go to adhere to Euler’s invention? As much as we might try new notational ideas, this one has staying power.

References

Florian Cajori. 1929. A History of Mathematical Notations, Volume II. Chicago: Open Court Publishing Company.

Leonhard Euler. 1734. Additamentum ad Dissertationem de Infinitis Curvis Eiusdem Generis. In Commentarii Academiae Scientarium Imperialis Petropolitanae, Volume VII (1734–35), pages 184–202, 1740.

Plain meaning

June 26th, 2015

In its reporting on yesterday’s Supreme Court ruling in King v. Burwell, Vox’s Matthew Yglesias makes the important point that Justice Scalia’s dissent is based on a profound misunderstanding of how language works. Justice Scalia would have it that “words no longer have meaning if an Exchange that is not established by a State is ‘established by the state.’” The Justice is implicitly appealing to a “plain meaning” view of legislation: courts should just take the plain meaning of a law and not interpret it.

If only that were possible. If you think there’s such a thing as acquiring the “plain meaning” of a text without performing any interpretive inference, you don’t understand how language works. It’s the same mistake that fundamentalists make when they talk about looking to the plain meaning of the Bible. (And which Bible would that be anyway? The King James Version? Translation requires the same kind of inferential process – arguably the same actual process – as extracting meaning through reading.)

Yglesias describes “What Justice Scalia’s King v. Burwell dissent gets wrong about words and meaning” this way:

Individual stringz of letterz r efforts to express meaningful propositions in an intelligible way. To succeed at this mission does not require the youse of any particular rite series of words and, in fact, a sntnce fll of gibberish cn B prfctly comprehensible and meaningful 2 an intelligent reader. To understand a phrse or paragraf or an entire txt rekwires the use of human understanding and contextual infrmation not just a dctionry.

The jokey orthography aside, this observation that understanding the meaning of linguistic utterances requires the application of knowledge and inference is completely uncontroversial to your average linguist. Too bad Supreme Court justices don’t defer to linguists on how language works.

Let’s take a simple example, the original “Winograd sentences” from back in 1973:

  1. The city councilmen refused the demonstrators a permit because they feared violence.
  2. The city councilmen refused the demonstrators a permit because they advocated violence.

To understand these sentences, to recover their “plain meaning”, requires resolving to whom the pronoun ‘they’ refers. Is it the city councilmen or the demonstrators? Clearly, it is the former in sentence (1) and the latter in sentence (2). How do you know, given that the two sentences differ only in the single word alternation ‘feared’/‘advocated’? The recovery of this single aspect of the “plain meaning” of the sentence requires an understanding of how governmental organizations work, how activists pursue their goals, likely public reactions to various contingent behaviors, and the like, along with application of all that knowledge through plausible inference. The Patient Protection and Affordable Care Act (PPACA) has by my (computer-aided) count some 479 occurrences of pronouns in nominative, accusative, or possessive. Each one of these requires the identification of its antecedent, with all the reasoning that implies, to get its “plain meaning”.

Examining the actual textual subject of controversy in the PPACA demonstrates the same issue. The phrase in question is “established by the state”. The American Heritage Dictionary provides six senses and nine subsenses for the transitive verb ‘establish’, of which (by my lights) sense 1a is appropriate for interpreting the PPACA: “To cause (an institution, for example) to come into existence or begin operating.” An alternative reading might, however, be sense 4: “To introduce and put (a law, for example) into force.” The choice of which sense is appropriate requires some reasoning of course about the context in which it was used, the denotata of the subject and object of the verb for instance. If one concludes that sense 1a was intended, then the Supreme Court’s decision is presumably correct, since a state’s formal relegation to the federal government the role of running the exchange is an act of “causing to come into existence”, although perhaps not an act of “introducing and putting into force”. (Or further explication of the notions of “causing” or “introducing” might be necessary to decide the matter.) If the latter sense 4 were intended, then perhaps the Supreme Court was wrong in its recent decision. The important point is this: There is no possibility of deferring to the “plain meaning” on the issue; one must reason about the intentions of the authors to acquire even the literal meaning of the text. This process is exactly what Chief Justice Roberts undertakes in his opinion. Justice Scalia’s view, that plain meaning is somehow available without recourse to the use of knowledge and reasoning, is unfounded even in the simplest of cases.

My colleague Steven Pinker has a nice piece up at the Chronicle of Higher Education on “Why Academics Stink at Writing”, accompanying the recent release of his new book The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century, which I’m awaiting my pre-ordered copy of. The last sentence of the Chronicle piece summarizes well:

In writing badly, we are wasting each other’s time, sowing confusion and error, and turning our profession into a laughingstock.

The essay provides a diagnosis of many of the common symptoms of fetid academic writing. He lists metadiscourse, professional narcissism, apologizing, shudder quotes, hedging, metaconcepts and nominalizations. It’s not breaking new ground, but these problems well deserve review.

I fall afoul of these myself, of course. (Nasty truth: I’ve used “inter alia” all too often, inter alia.) But one issue I disagree with Pinker on is the particular style of metadiscourse he condemns that provides a roadmap of a paper. Here’s an example from a recent paper of mine.

After some preliminaries (Section 2), we present a set of known results relating context-free languages, tree homomorphisms, tree automata, and tree transducers to extend them for the tree-adjoining languages (Section 3), presenting these in terms of restricted kinds of functional programs over trees, using a simple grammatical notation for describing the programs. We review the definition of tree-substitution and tree-adjoining grammars (Section 4) and synchronous versions thereof (Section 5). We prove the equivalence between STSG and a variety of bimorphism (Section 6).

This certainly smacks of the first metadiscourse example Pinker provides:

“The preceding discussion introduced the problem of academese, summarized the principle theories, and suggested a new analysis based on a theory of Turner and Thomas. The rest of this article is organized as follows. The first section consists of a review of the major shortcomings of academic prose. …”

Who needs that sort of signposting in a 6,000-word essay? But in the context of a 50-page article, giving a kind of table of contents such as this doesn’t seem out of line. Much of the metadiscourse that Pinker excoriates is unneeded, but appropriate advance signposting can ease the job of the reader considerably. Sometimes, as in the other examples Pinker gives, “meta­discourse is there to help the writer, not the reader, since she has to put more work into understanding the signposts than she saves in seeing what they point to.” But anything that helps the reader to understand the high-level structure of an object as complex as a long article seems like a good thing to me.

The penultimate sentence of Pinker’s piece places poor academic writing in context:

Our indifference to how we share the fruits of our intellectual labors is a betrayal of our calling to enhance the spread of knowledge.

That sentiment applies equally well – arguably more so – to the venues where we publish. By placing our articles in journals that lock up access tightly we are also betraying our calling. And it doesn’t matter how good the writing is if it can’t be read in the first place.

With few exceptions, scholars would be better off writing their papers in a lightweight markup format called Markdown, rather than using a word-processing program like Microsoft Word. This post explains why, and reveals a hidden agenda as well.1

Microsoft Word is not appropriate for scholarly article production

Old two pan balance
…lightweight…
Old two pan balance” image from Nikodem Nijaki at Wikimedia Commons. Used by permission.

Before turning to lightweight markup, I review the problems with Microsoft Word as the lingua franca for producing scholarly articles. This ground has been heavily covered. (Here’s a recent example.) The problems include:

Substantial learning curve
Microsoft Word is a complicated program that is difficult to use well.
Appearance versus structure
Word-processing programs like Word conflate composition with typesetting. They work by having you specify how a document should look, not how it is structured. A classic example is section headings. In a typical markup language, you specify that something is a heading by marking it as a heading. In a word-processing program you might specify that something is a heading by increasing the font size and making it bold. Yes, Word has “paragraph styles”, and some people sometimes use them more or less properly, if you can figure out how. But most people don’t, or don’t do so consistently, and the resultant chaos has been well documented. It has led to a whole industry of people who specialize in massaging Word files into some semblance of consistency.
Backwards compatibility
Word-processing program file formats have a tendency to change. Word itself has gone through multiple incompatible file formats in the last decades, one every couple of years. Over time, you have to keep up with the latest version of the software to do anything at all with a new document, but updating your software may well mean that old documents are no longer identically rendered. With Markdown, no software is necessary to read documents. They are just plain text files with relatively intuitive markings, and the underlying file format (UTF-8 née ASCII) is backward compatible to 1963. Further, typesetting documents in Markdown to get the “nice” version is based on free and open-source software (markdown, pandoc) and built on other longstanding open source standards (LaTeX, BibTeX).
Poor typesetting
Microsoft Word does a generally poor job of typesetting, as exemplified by hyphenation, kerning, mathematical typesetting. This shouldn’t be surprising, since the whole premise of a word-processing program means that the same interface must handle both the specification and typesetting in real-time, a recipe for having to make compromises.
Lock-in
Because Microsoft Word’s file format is effectively proprietary, users are locked in to a single software provider for any and all functionality. The file formats are so complicated that alternative implementations are effectively impossible.

Lightweight markup is the solution

The solution is to use a markup format that allows specification of the document (providing its logical structure) separate from the typesetting of that document. Your document is specified – that is, generated and stored – as straight text. Any formatting issues are handled not by changing the formatting directly via a graphical user interface but by specifying the formatting textually using a specific textual notation. For instance, in the HTML markup language, a word or phrase that should be emphasized is textually indicated by surrounding it with <em>…</em>. HTML and other powerful markup formats like LaTeX and various XML formats carry relatively large overheads. They are complex to learn and difficult to read. (Typing raw XML is nobody’s idea of fun.) Ideally, we would want a markup format to be lightweight, that is, simple, portable, and human-readable even in its raw state.

Markdown is just such a lightweight markup language. In Markdown, emphasis is textually indicated by surrounding the phrase with asterisks, as is familiar from email conventions, for example, *lightweight*. See, that wasn’t so hard. Here’s another example: A bulleted list is indicated by prepending each item on a separate line with an asterisk, like this:

 * First item
 * Second item

which specifies the list

  • First item
  • Second item

Because specification and typesetting are separated, software is needed to convert from one to the other, to typeset the specified document. For reasons that will become clear later, I recommend the open-source software pandoc. Generally, scholars will want to convert their documents to PDF (though pandoc can convert to a huge variety of other formats). To convert file.md (the Markdown-format specification file) to PDF, the command

 pandoc file.md -o file.pdf

suffices. Alternatively, there are many editing programs that allow entering, editing, and typesetting Markdown. I sometimes use Byword. In fact, I’m using it now.

Markup languages range from the simple to the complex. I argue for Markdown for four reasons:

  1. Basic Markdown, sufficient for the vast majority of non-mathematical scholarly writing, is dead simple to learn and remember, because the markup notations were designed to mimic the kinds of textual conventions that people are used to – asterisks for emphasis and for indicating bulleted items, for instance. The coverage of this basic part of Markdown includes: emphasis, section structure, block quotes, bulleted and numbered lists, simple tables, and footnotes.
  2. Markdown is designed to be readable and the specified format understandable even in its plain text form, unlike heavier weight markup languages such as HTML.
  3. Markdown is well supported by a large ecology of software systems for entering, previewing, converting, typesetting, and collaboratively editing documents.
  4. Simple things are simple. More complicated things are more complicated, but not impossible. The extensions to Markdown provided by pandoc cover more or less the rest of what anyone might need for scholarly documents, including links, cross-references, figures, citations and bibliographies (via BibTeX), mathematical typesetting (via LaTeX), and much more.For instance, this equation (the Cauchy-Schwarz inequality) will typeset well in generated PDF files, and even in HTML pages using the wonderful MathJax library.\[ \left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right) \](Pandoc also provides some extensions that simplify and extend the basic Markdown in quite nice ways, for instance, definition lists, strikeout text, a simpler notation for tables.)

Above, I claimed that scholars should use Markdown “with few exceptions”. The exceptions are:

  1. The document requires nontrivial mathematical typesetting. In that case, you’re probably better off using LaTeX. Anyone writing a lot of mathematics has given up word processors long ago and ought to know LaTeX anyway. Still, I’ll often do a first draft in Markdown with LaTeX for the math-y bits. Pandoc allows LaTeX to be included within a Markdown file (as I’ve done above), and preserves the LaTeX markup when converting the Markdown to LaTeX. From there, it can be typeset with LaTeX. Microsoft Word would certainly not be appropriate for this case.
  2. The document requires typesetting with highly refined or specialized aspects. I’d probably go with LaTeX here too, though desktop publishing software (InDesign) is also appropriate if there’s little or no mathematical typesetting required. Microsoft Word would not be appropriate for this case either.

Some have proposed that we need a special lightweight markup language for scholars. But Markdown is sufficiently close, and has such a strong community of support and software infrastructure, that it is more than sufficient for the time being. Further development would of course be helpful, so long as the urge to add “features” doesn’t overwhelm its core simplicity.

The hidden agenda

I have a hidden agenda. Markdown is sufficient for the bulk of cases of composing scholarly articles, and simple enough to learn that academics might actually use it. Markdown documents are also typesettable according to a separate specification of document style, and retargetable to multiple output formats (PDF, HTML, etc.).2 Thus, Markdown could be used as the production file format for scholarly journals, which would eliminate the need for converting between the authors’ manuscript version and the publishers internal format, with all the concomitant errors that process is prone to produce.

In computer science, we have by now moved almost completely to a system in which authors provide articles in LaTeX so that no retyping or recomposition of the articles needs to be done for the publisher’s typesetting system. Publishers just apply their LaTeX style files to our articles. The result has been a dramatic improvement in correctness and efficiency. (It is in part due to such an efficient production process that the cost of running a high-end computer science journal can be so astoundingly low.)

Even better, there is a new breed of collaborative web-based document editing tools being developed that use Markdown as their core file format, tools like Draft and Authorea. They provide multi-author editing, versioning, version comparison, and merging. These tools could constitute the system by which scholarly articles are written, collaborated on, revised, copyedited, and moved to the journal production process, generating efficiencies for a huge range of journals, efficiencies that we’ve enjoyed in computer science and mathematics for years.

As Rob Walsh of ScholasticaHQ says, “One of the biggest bottlenecks in Open Access publishing is typesetting. It shouldn’t be.” A production ecology built around Markdown could be the solution.


  1. Many of the ideas in this post are not new. Complaints about WYSIWYG word-processing programs have a long history. Here’s a particularly trenchant diatribe pointing out the superiority of disentangling composition from typesetting. The idea of “scholarly Markdown” as the solution is also not new. See this post or this one for similar proposals. I go further in viewing certain current versions of Markdown (as implemented in Pandoc) as practical already for scholarly article production purposes, though I support coordinated efforts that could lead to improved lightweight markup formats for scholarly applications. Update September 22, 2014: I’ve just noticed a post by Dennis Tenen and Grant Wythoff at The Programming Historian on “Sustainable Authorship in Plain Text using Pandoc and Markdown” giving a tutorial on using these tools for writing scholarly history articles.
  2. As an example, I’ve used this very blog post. Starting with the Markdown source file (which I’ve attached to this post), I first generated HTML output for copying into the blog using the command
    pandoc -S --mathjax --base-header-level=3 markdownpost.md -o markdownpost.html

    A nicely typeset version using the American Mathematical Society’s journal article document style can be generated with

    pandoc markdownpost.md -V documentclass:amsart -o markdownpost-amsart.pdf

    To target the style of ACM transactions instead, the following command suffices:

    pandoc markdownpost.md -V documentclass:acmsmall -o markdownpost-acmsmall.pdf

    Both PDF versions are also attached to this post.

    Attachments
Turing Test
…that’s not Turing’s Test…
Turing Test” image from xkcd. Used by permission.

There has been a flurry of interest in the Turing Test in the last few days, precipitated by a claim that (at last!) a program has passed the Test. The program in question is called “Eugene Goostman” and the claim is promulgated by Kevin Warwick, a professor of cybernetics at the University of Reading and organizer of a recent chatbot competition there.

The Turing Test is a topic that I have a deep interest in (see this, and this, and this, and this, and, most recently, this), so I thought to give my view on Professor Warwick’s claim “We are therefore proud to declare that Alan Turing’s Test was passed for the first time on Saturday.” The main points are these. The Turing Test was not passed on Saturday, and “Eugene Goostman” seems to perform qualitatively about as poorly as many other chatbots in emulating human verbal behavior. In summary: There’s nothing new here; move along.

First, the Turing Test that Turing had in mind was a criterion of indistinguishability in verbal performance between human and computer in an open-ended wide-ranging interaction. In order for the Test to be passed, judges had to perform no better than chance in unmasking the computer. But in the recent event, the interactions were quite time-limited (only five minutes) and in any case, the purported Turing-Test-passing program was identified correctly more often than not by the judges (almost 70% of the time in fact). That’s not Turing’s test.

Update June 17, 2014: The time limitation was even worse than I thought. According to my colleague Luke Hunsberger, computer science professor at Vassar College, who was a judge in this event, the five minute time limit was for two simultaneous interactions. Further, there were often substantial response delays in the system. In total, he estimated that a judge might average only four or five rounds of chat with each interlocutor. I’ve argued before that a grossly time-limited Turing Test is no Turing Test at all.

Sometimes, people trot out the prediction from Turing’s seminal 1950 Mind article that “I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about \(10^9\), to make them play the imitation game so well that an average interrogator will not have more than 70 per cent. chance of making the right identification after five minutes of questioning.” As I explain in my book on the Test:

The first thing to note about the prediction is that it is not a prediction about the Test per se: Turing expects 70 percent prediction accuracy, not the more difficult 50 percent expected by chance, and this after only a limited conversation of five minutes. He is therefore predicting passage of a test much simpler than the Test.

Not only does the prediction not presuppose a full Turing Test, but it could well be argued that it had already come to pass with the advent of Joseph Weizenbaum’s Eliza some thirty-five years early. Weizenbaum developed a simple computer program that attempted to imitate the parroting conversational style of a Rogerian psychotherapist…. Although the methods used were quite simple – repeating the user’s question after adjusting some pronouns, throwing in a stock phrase every now and then – the result was, in its own way, extremely convincing.

Second, “Eugene Goostman” uses the very techniques that began with Weizenbaum’s  “Eliza” program from the 1960’s. We see the same tricks – repeating the judge’s statements with simple substitutions, keyword-triggered responses, falling back on vague or unresponsive replies, and the like. Those tricks are no more successful than they have been in the two decades of runnings of the Loebner Prize Competition, another ill-conceived attempt at running a Turing-like test. And there too, entrants used the trick of having their programs emulate humans with built in excuses.  “Eugene Goostman” purports to be a non-English-fluent child, rather than a fluent adult.1

In fact, anyone with the slightest familiarity with chatbots, the modern incarnations of “Eliza”, could unmask “Eugene Goostman” quickly. (The judges at the recent competition were apparently not so familiar.) Scott Aaronson has provided a transcript of his own interaction with “Eugene Goostman”, which shows the telltale signs as soon as the computer’s first reply, which blatantly ignores an utterly trivial question and tries to change the subject:

Scott: Which is bigger, a shoebox or Mount Everest?

Eugene: I can’t make a choice right now. I should think it out later. And I forgot to ask you where you are from…

Here’s an appeal to cute, vague language:

Scott: How many legs does a camel have?

Eugene: Something between 2 and 4. Maybe, three? :-))) By the way, I still don’t know your specialty – or, possibly, I’ve missed it?

And here’s the repetition of a canned response:

Scott: No, I need to know that you’re not a chatbot. Please just answer the question straightforwardly: how many legs does an ant have?

Eugene: Something between 2 and 4. Maybe, three? :-))) Oh, what a fruitful conversation;-)

Here’s a canned keyword-triggered response, nonresponsive as usual:

Scott: In your opinion, does your existence demonstrate any idea or principle that wasn’t demonstrated just as convincingly by ELIZA, Joseph Weizenbaum’s chatbot from the 1960s?

Eugene: Eliza was a break-thru. All the bots after it were nothing but weak parodies, claiming to have “revolutionary improvements”.

And here’s “Eliza”‘s trick of rewriting the judge’s utterance after pronoun replacement and (poorly executed) synonym substitution:

Scott: OK, I’ll count that as the second sensible thing you’ve said (grading generously). Don’t you think that a more convincing chatbot could be created, by using Google to trawl the entire web for plausible-sounding answers (analogous to what IBM’s Watson did)?

Eugene: Yes, I do think that a more convincing chatbot can be built by using google to trawl the all web for plausible – sounding – I hope I have right to have my own opinion. Maybe, let’s talk about something else? What would you like to discuss?

Literally every one of “Eugene”‘s responses reflects its “Eliza”-like programming. It would be amusing, if it weren’t so predictable.

In summary, “Eugene Goostman” is not qualitatively superior to other chatbots, and certainly has not passed a true Turing Test. It isn’t even close.


  1. In a parody of this approach, the late John McCarthy, professor of computer science at Stanford University and inventor of the term “artifical intelligence”, wrote a letter to the editor responding to a publication about an “Eliza”-like program that claimed to emulate a paranoid psychiatric patient. He presented his own experiments that I described in my Turing Test book: “He had designed an even better program, which passed the same test. His also had the virtue of being a very inexpensive program, in these times of tight money. In fact you didn’t even need a computer for it. All you needed was an electric typewriter. His program modeled infantile autism. And the transcripts – you type in your questions, and the thing just sits there and hums – cannot be distinguished by experts from transcripts of real conversations with infantile autistic patients.”
'reference' by flickr user Sara S.
… altogether too much concern with the contents of the journal’s spine text…
reference” image by flickr user Sara S. used by permission.

Precipitated by a recent request to review some proposals for new open-access journals, I spent some time gathering my own admittedly idiosyncratic thoughts on some of the issues that should be considered when founding new open-access journals. I make them available here. Good sources for more comprehensive information on launching and operating open-access journals are SPARC’s open-access journal publishing resource index and the Open Access Directories guides for OA journal publishers.

Unlike most of my posts, I may augment this post over time, and will do so without explicit marking of the changes. Your thoughts on additions to the topics below—via comments or email—are appreciated. A version number (currently version 1.0) will track the changes for reference.

It is better to flip a journal than to found one

The world has enough journals. Adding new open-access journals as alternatives to existing ones may be useful if there are significant numbers of high quality articles being generated in a field for which there is no reasonable open-access venue for publication. Such cases are quite rare, especially given the rise of open-access “megajournals” covering the sciences (PLoS ONE, Scientific Reports, AIP Advances, SpringerPlus, etc.), and the social sciences and humanities (SAGE Open). Where there are already adequate open-access venues (even if no one journal is “perfect” for the field), scarce resources are probably better spent elsewhere, especially on flipping journals from closed to open access.

Admittedly, the world does not have enough open-access journals (at least high-quality ones). So if it is not possible to flip a journal, founding a new one may be a reasonable fallback position, but it is definitely the inferior alternative.

Licensing should be by CC-BY

As long as you’re founding a new journal, its contents should be as open as possible consistent with appropriate attribution. That exactly characterizes the CC-BY license. It’s also a tremendously simple approach. Once the author grants a CC-BY license, no further rights need be granted to the publisher. There’s no need for talk about granting the publisher a nonexclusive license to publish the article, etc., etc. The CC-BY license already allows the publisher to do so. There’s no need to talk about what rights the author retains, since the author retains all rights subject to the nonexclusive CC-BY license. I’ve made the case for a CC-BY license at length elsewhere.

It’s all about the editorial board

The main product that a journal is selling is its reputation. A new journal with no track record needs high quality submissions to bootstrap that reputation, and at the start, nothing is more convincing to authors to submit high quality work to the journal than its editorial board. Getting high-profile names somewhere on the masthead at the time of the official launch is the most important thing for the journal to do. (“We can add more people later” is a risky proposition. You may not get a second chance to make a first impression.)

Getting high-profile names on your board may occur naturally if you use the expedient of flipping an existing closed-access journal, thereby stealing the board, which also has the benefit of acquiring the journal’s previous reputation and eliminating one more subscription journal.

Another good idea for jumpstarting a journal’s reputation is to prime the article pipeline by inviting leaders in the field to submit their best articles to the journal before its official launch, so that the journal announcement can provide information on forthcoming articles by luminaries.

Follow ethical standards

Adherence to the codes of conduct of the Open Access Scholarly Publishers Association (OASPA) and the Committee on Publication Ethics (COPE) should be fundamental. Membership in the organizations is recommended; the fees are extremely reasonable.

You can outsource the process

There is a lot of interest among certain institutions to found new open-access journals, institutions that may have no particular special expertise in operating journals. A good solution is to outsource the operation of the journal to an organization that does have special expertise, namely, a journal publisher. There are several such publishers who have experience running open-access journals effectively and efficiently. Some are exclusively open-access publishers, for example, Co-Action Publishing, Hindawi Publishing, Ubiquity Press. Others handle both open- and closed-access journals: HighWire Press, Oxford University Press, ScholasticaHQ, Springer/BioMed Central, Wiley. This is not intended as a complete listing (the Open Access Directory has a complementary offering), nor in any sense an endorsement of any of these organizations, just a comment that shopping the journal around to a publishing partner may be a good idea. Especially given the economies of scale that exist in journal publishing, an open-access publishing partner may allow the journal to operate much more economically than having to establish a whole organization in-house.

Certain functionality should be considered a baseline

Geoffrey Pullum, in his immensely satisfying essays “Stalking the Perfect Journal” and “Seven Deadly Sins in Journal Publishing”, lists his personal criteria in journal design. They are a good starting point, but need updating for the era of online distribution. (There is altogether too much concern with the contents of the journal’s spine text for instance.)

  • Reviewing should be anonymous (with regard to the reviewers) and blind (with regard to the authors), except where a commanding argument can be given for experimenting with alternatives.
  • Every article should be preserved in one (or better, more than one) preservation system. CLOCKSS, Portico1, a university or institutional archival digital repository are good options.
  • Every article should have complete bibliographic metadata on the first page, including license information (a simple reference to CC-BY; see above), and (as per Pullum) first and last page numbers.
  • The journal should provide DOIs for its articles. OASPA membership is an inexpensive way to acquire the ability to assign DOIs. An article’s DOI should be included in the bibliographic metadata on the first page.

There’s additional functionality beyond this baseline that would be ideal, though the tradeoff against the additional effort required would have to be evaluated.

  • Provide article-level metrics, especially download statistics, though other “altmetrics” may be helpful.
  • Provide access to the articles in multiple formats in addition to PDF: HTML, XML with the NLM DTD.
  • Provide the option for readers to receive alerts of new content through emails and RSS feeds.
  • Encourage authors to provide the underlying data to be distributed openly as well, and provide the infrastructure for them to do so.

Take advantage of the networked digital era

Many journal publishing conventions of long standing are no longer well motivated in the modern era. Here are a few examples. They are not meant to be exhaustive. You can probably think of others. The point is that certain standard ideas can and should be rethought.

  • There is no longer any need for “issues” of journals. Each article should be published as soon as it is finished, no later and no sooner. If you’d like, an “issue” number can be assigned that is incremented for each article. (Volumes, incremented annually, are still necessary because many aspects of the scholarly publishing and library infrastructure make use of them. They are also useful for the purpose of characterizing a bolus of content for storage and preservation purposes.)
  • Endnotes, a relic of the day when typesetting was a complicated and fraught process that was eased by a human being not having to determine how much space to leave at the bottom of a page for footnotes, should be permanently retired. Footnotes are far easier for readers (which is the whole point really), and computers do the drudgery of calculating the space for them.
  • Page limits are silly. In the old physical journal days, page limits had two purposes. They were necessary because journal issues came in quanta of page signatures, and therefore had fundamental physical limits to the number of pages that could be included. A network-distributed journal no longer has this problem. Page limits also serve the purpose of constraining the author to write relatively succinctly, easing the burden on reviewer and (eventually) reader. But for this purpose the page is not a robust unit of measurement of the constrained resource, the reviewers’ and the readers’ attention. One page can hold anything from a few hundred to a thousand or more words. If limits are to be required, they should be stated in appropriate units such as the number of words. The word count should not include figures, tables, or bibliography, as they impinge on readers’ attention in a qualitatively different way.
  • Author-date citation is far superior to numeric citation in every way except for the amount of space and ink required. Now that digital documents use no physical space or ink, there is no longer an excuse for numeric citations. Similarly, ibid. and op. cit. should be permanently retired. I appreciate that different fields have different conventions on these matters. That doesn’t change the fact that those fields that have settled on numeric citations or ibidded footnotes are on the wrong side of technological history.
  • Extensive worry about and investment in fancy navigation within and among the journal’s articles is likely to be a waste of time, effort, and resources. To first approximation, all accesses to articles in the journal will come from sites higher up in the web food chain—the Google’s and Bing’s, the BASE’s and OAIster’s of the world. Functionality that simplifies navigation among articles across the whole scholarly literature (cross-linked DOIs in bibliographies, for instance, or linked open metadata of various sorts) is a different matter.

Think twice

In the end, think long and hard about whether founding a new open-access journal is the best use of your time and your institution’s resources in furthering the goals of open scholarly communication. Operating a journal is not free, in cash and in time. Perhaps a better use of resources is making sure that the academic institutions and funders are set up to underwrite the existing open-access journals in the optimal way. But if it’s the right thing to do, do it right.


  1. A caveat on Portico’s journal preservation service: The service is designed to release its stored articles when a “trigger event” occurs, for instance, if the publisher ceases operations. Unfortunately, Portico doesn’t release the journal contents openly, but only to its library participants, even for OA journals. However, if the articles were licensed under CC-BY, any participating library could presumably reissue them openly.
...our little tiff in the late 18th century... / NYC - Metropolitan Museum of Art: Washington Crossing the Delaware / image by flickr user wallyg / used by permission
…our little tiff in the late 18th century…NYC – Metropolitan Museum of Art: Washington Crossing the Delaware” image by flickr user wallyg. Used by permission.

I’m shortly off to give a talk at the annual meeting of the Linguistic Society of America (on why open access is better for scholarly societies, which I’ll be blogging about soon), but in the meantime, a linguistically related post about punctuation.

Careful readers of this blog (are there any careful readers of this blog? are there any readers at all?) will note that I generally eschew the peculiarly American convention of moving punctuation within a closing quotation mark. Examples from The Occasional Pamphlet abound: hereherehereherehereherehere, and here. And that’s just from 2012. It’s surprising how often this punctuation convention comes into play.

Instead, I use the convention that only the stuff being quoted is put within the quotation marks. This is sometimes called the “British” convention, despite the fact that other nationalities use it as well, presumably to emphasize the American/British dualism extant from our little tiff in the late 18th century. I use the “British” convention because the “American” convention is, in technical terms, stupid.

The story goes that punctuation appearing within the quotation mark is more aesthetically pleasing than punctuation outside the quotation mark. But even if that were true, clarity trumps beauty. Moving the punctuation means that when you see a quoted string with some final punctuation, you don’t know if that punctuation is or is not intended to be part of the thing being quoted; it is systematically ambiguous.

Apparently, my view is highly controversial. For example, when working with MIT Press on my book on the Turing test, my copy editor (who, by the way, was wonderful, and amazingly patient) moved all my punctuation around to satisfy the American convention. I moved them all back. She moved them again. We got into a long discussion of the matter; it seems she had never confronted an author who felt strongly about punctuation before. (I presume she had never copy-edited Geoff Pullum, from whom more later.) As a compromise, we left the punctuation the way I liked it—mostly—but she made me add the following prefatory editorial note:

Throughout the text, the American convention of moving punctuation within closing quotation marks (whether or not the punctuation is part of what is being referred to) is dropped in favor of the more logical and consistent convention of placing only the quoted material within the marks.

I would now go on to explain why the “British” convention is better than the “stupid” convention, except that Geoff Pullum has done so much better a job, far better than I ever could. Here is an excerpt from his essay “Punctuation and human freedom” published in Natural Language and Linguistic Theory and reproduced in his book The Great Eskimo Vocabulary Hoax. I recommend the entire essay to you.

I want you to first consider the string ‘the string’ and the string ‘the string.’, noting that it takes ten keystrokes to type the string in the first set of quotes, and eleven to type the string in the second pair. Imagine you wanted to quote me on the latter point. You might want to say (1).

(1) Pullum notes that it takes eleven keystrokes to type the string ‘the string.’

No problem there; (1) is true (and grammatical if we add a final period). But now suppose you want to say this:

(2) Pullum notes that it takes ten keystrokes to type the string ‘the string’.

You won’t be able to publish it. Your copy-editor will change it before the first proof stage to (3), which is false (though regarded by copy-editors as grammatical):

(3) Pullum notes that it takes ten keystrokes to type the string ‘the string.’

Why? Because the copy-editor will insist that when a sentence ends with a quotation, the closing quotation mark must follow the punctuation mark.

I say this must stop. Linguists have a duty to the public to use their expertise in arguing for changes to the fabric of society when its interests are threatened. And we have such a situation here.

What say we all switch over to the logical quotation punctuation approach and save the fabric of society, shall we?

Karen Spärck Jones, 1935-2007

In honor of Ada Lovelace Day 2012, I write about the only female winner of the Lovelace Medal awarded by the British Computer Society for “individuals who have made an outstanding contribution to the understanding or advancement of Computing”. Karen Spärck Jones was the 2007 winner of the medal, awarded shortly before her death. She also happened to be a leader in my own field of computational linguistics, a past president of the Association for Computational Linguistics. Because we shared a research field, I had the honor of knowing Karen and the pleasure of meeting her on many occasions at ACL meetings.

One of her most notable contributions to the field of information retrieval was the idea of inverse document frequency. Well before search engines were a “thing”, Karen was among the leaders in figuring out how such systems should work. Already in the 1960’s there had arisen the idea of keyword searching within sets of documents, and the notion that the more “hits” a document receives, the higher ranked it should be. Karen noted in her seminal 1972 paper “A statistical interpretation of term specificity and its application in retrieval” that not all hits should be weighted equally. For terms that are broadly distributed throughout the corpus, their occurrence in a particular document is less telling than occurrence of terms that occur in few documents. She proposed weighting each term by its “inverse document frequency” (IDF), which she defined as log(N/(n + 1)) where N is the number of documents and n the number of documents containing the keyword under consideration. When the keyword occurs in all documents, IDF approaches 1 for large N, but as the keyword occurs in fewer and fewer documents (making it a more specific and presumably more important keyword), IDF rises. The two notions of weighting (frequency of occurrence of the keyword together with its specificity as measured by inverse document frequency) are combined multiplicatively in the by now standard tf*idf metric; tf*idf or its successors underlie essentially all information retrieval systems in use today.

In Karen’s interview for the Lovelace Medal, she opined that “Computing is too important to be left to men.” Ada Lovelace would have agreed.

Talmud and the Turing Test

June 16th, 2012

Image of the statue of the Golem of Prague at the entrance to the Jewish Quarter of Prague by flickr user D_P_R. Used by permission.
…the Golem…
Image of the statue of the Golem of Prague at the entrance to the Jewish Quarter of Prague by flickr user D_P_R. Used by permission (CC-BY 2.0).

Alan Turing, the patron saint of computer science, was born 100 years ago this week (June 23). I’ll be attending the Turing Centenary Conference at University of Cambridge this week, and am honored to be giving an invited talk on “The Utility of the Turing Test”. The Turing Test was Alan Turing’s proposal for an appropriate criterion to attribute intelligence (that is, capacity for thinking) to a machine: you verify through blinded interactions that the machine has verbal behavior indistinguishable from a person.

In preparation for the talk, I’ve been looking at the early history of the premise behind the Turing Test, that language plays a special role in distinguishing thinking from nonthinking beings. I had thought it was an Enlightenment idea, that until the technological advances of the 16th and 17th centuries, especially clockwork mechanisms, the whole question of thinking machines would never have entertained substantive discussion. As I wrote earlier,

Clockwork automata provided a foundation on which one could imagine a living machine, perhaps even a thinking one. In the midst of the seventeenth-century explosion in mechanical engineering, the issue of the mechanical nature of life and thought is found in the philosophy of Descartes; the existence of sophisticated automata made credible Descartes’s doctrine of the (beast-machine), that animals were machines. His argument for the doctrine incorporated the first indistinguishability test between human and machine, the first Turing test, so to speak.

But I’ve seen occasional claims here and there that there is in fact a Talmudic basis to the Turing Test. Could this be true? Was the Turing Test presaged, not by centuries, but by millennia?

Uniformly, the evidence for Talmudic discussion of the Turing Test is a single quote from Sanhedrin 65b.

Rava said: If the righteous wished, they could create a world, for it is written, “Your iniquities have been a barrier between you and your God.” For Rava created a man and sent him to R. Zeira. The Rabbi spoke to him but he did not answer. Then he said: “You are [coming] from the pietists: Return to your dust.”

Rava creates a Golem, an artificial man, but Rabbi Zeira recognizes it as nonhuman by its lack of language and returns it to the dust from which it was created.

This story certainly describes the use of language to unmask an artificial human. But is it a Turing Test precursor?

It depends on what one thinks are the defining aspects of the Turing Test. I take the central point of the Turing Test to be a criterion for attributing intelligence. The title of Turing’s seminal Mind article is “Computing Machinery and Intelligence”, wherein he addresses the question “Can machines think?”. Crucially, the question is whether the “test” being administered by Rabbi Zeira is testing the Golem for thinking, or for something else.

There is no question that verbal behavior can be used to test for many things that are irrelevant to the issues of the Turing Test. We can go much earlier than the Mishnah to find examples. In Judges 12:5–6 (King James Version)

5 And the Gileadites took the passages of Jordan before the Ephraimites: and it was so, that when those Ephraimites which were escaped said, Let me go over; that the men of Gilead said unto him, Art thou an Ephraimite? If he said, Nay;

6 Then said they unto him, Say now Shibboleth: and he said Sibboleth: for he could not frame to pronounce it right. Then they took him, and slew him at the passages of Jordan: and there fell at that time of the Ephraimites forty and two thousand.

The Gileadites use verbal indistinguishability (of the pronounciation of the original shibboleth) to unmask the Ephraimites. But they aren’t executing a Turing Test. They aren’t testing for thinking but rather for membership in a warring group.

What is Rabbi Zeira testing for? I’m no Talmudic scholar, so I defer to the experts. My understanding is that the Golem’s lack of language indicated not its own deficiency per se, but the deficiency of its creators. The Golem is imperfect in not using language, a sure sign that it was created by pietistic kabbalists who themselves are without sufficient purity.

Talmudic scholars note that the deficiency the Golem exhibits is intrinsically tied to the method by which the Golem is created: language. The kabbalistic incantations that ostensibly vivify the Golem were generated by mathematical combinations of the letters of the Hebrew alphabet. Contemporaneous understanding of the Golem’s lack of speech was connected to this completely formal method of kabbalistic letter magic: “The silent Golem is, prima facie, a foil to the recitations involved in the process of his creation.” (Idel, 1990, pages 264–5) The imperfection demonstrated by the Golem’s lack of language is not its inability to think, but its inability to wield the powers of language manifest in Torah, in prayer, in the creative power of the kabbalist incantations that gave rise to the Golem itself.

Only much later does interpretation start connecting language use in the Golem to soul, that is, to an internal flaw: “However, in the medieval period, the absence of speech is related to what was conceived then to be the highest human faculty: reason according to some writers, or the highest spirit, Neshamah, according to others.” (Idel, 1990, page 266, emphasis added)

By the 17th century, the time was ripe for consideration of whether nonhumans had a rational soul, and how one could tell. Descartes’s observations on the special role of language then serve as the true precursor to the Turing Test. Unlike the sole Talmudic reference, Descartes discusses the connection between language and thinking in detail and in several places — the Discourse on the Method, the Letter to the Marquess of Newcastle — and his followers — Cordemoy, La Mettrie — pick up on it as well. By Turing’s time, it is a natural notion, and one that Turing operationalizes for the first time in his Test.

The test of the Golem in the Sanhedrin story differs from the Turing Test in several ways. There is no discussion that the quality of language use was important (merely its existence), no mention of indistinguishability of language use (but Descartes didn’t either), and certainly no consideration of Turing’s idea of blinded controls. But the real point is that at heart the Golem test was not originally a test for the intelligence of the Golem at all, but of the purity of its creators.

References

Idel, Moshe. 1990. Golem: Jewish magical and mystical traditions on the artificial anthropoid, Albany, N.Y.: State University of New York Press.