June 10th, 2014
|…that’s not Turing’s Test…
“Turing Test” image from xkcd. Used by permission.
There has been a flurry of interest in the Turing Test in the last few days, precipitated by a claim that (at last!) a program has passed the Test. The program in question is called “Eugene Goostman” and the claim is promulgated by Kevin Warwick, a professor of cybernetics at the University of Reading and organizer of a recent chatbot competition there.
The Turing Test is a topic that I have a deep interest in (see this, and this, and this, and this, and, most recently, this), so I thought to give my view on Professor Warwick’s claim “We are therefore proud to declare that Alan Turing’s Test was passed for the first time on Saturday.” The main points are these. The Turing Test was not passed on Saturday, and “Eugene Goostman” seems to perform qualitatively about as poorly as many other chatbots in emulating human verbal behavior. In summary: There’s nothing new here; move along.
First, the Turing Test that Turing had in mind was a criterion of indistinguishability in verbal performance between human and computer in an open-ended wide-ranging interaction. In order for the Test to be passed, judges had to perform no better than chance in unmasking the computer. But in the recent event, the interactions were quite time-limited (only five minutes) and in any case, the purported Turing-Test-passing program was identified correctly more often than not by the judges (almost 70% of the time in fact). That’s not Turing’s test.
Update June 17, 2014: The time limitation was even worse than I thought. According to my colleague Luke Hunsberger, computer science professor at Vassar College, who was a judge in this event, the five minute time limit was for two simultaneous interactions. Further, there were often substantial response delays in the system. In total, he estimated that a judge might average only four or five rounds of chat with each interlocutor. I’ve argued before that a grossly time-limited Turing Test is no Turing Test at all.
Sometimes, people trot out the prediction from Turing’s seminal 1950 Mind article that “I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about \(10^9\), to make them play the imitation game so well that an average interrogator will not have more than 70 per cent. chance of making the right identification after five minutes of questioning.” As I explain in my book on the Test:
The first thing to note about the prediction is that it is not a prediction about the Test per se: Turing expects 70 percent prediction accuracy, not the more difficult 50 percent expected by chance, and this after only a limited conversation of five minutes. He is therefore predicting passage of a test much simpler than the Test.
Not only does the prediction not presuppose a full Turing Test, but it could well be argued that it had already come to pass with the advent of Joseph Weizenbaum’s Eliza some thirty-five years early. Weizenbaum developed a simple computer program that attempted to imitate the parroting conversational style of a Rogerian psychotherapist…. Although the methods used were quite simple – repeating the user’s question after adjusting some pronouns, throwing in a stock phrase every now and then – the result was, in its own way, extremely convincing.
Second, “Eugene Goostman” uses the very techniques that began with Weizenbaum’s “Eliza” program from the 1960′s. We see the same tricks – repeating the judge’s statements with simple substitutions, keyword-triggered responses, falling back on vague or unresponsive replies, and the like. Those tricks are no more successful than they have been in the two decades of runnings of the Loebner Prize Competition, another ill-conceived attempt at running a Turing-like test. And there too, entrants used the trick of having their programs emulate humans with built in excuses. “Eugene Goostman” purports to be a non-English-fluent child, rather than a fluent adult.1
In fact, anyone with the slightest familiarity with chatbots, the modern incarnations of “Eliza”, could unmask “Eugene Goostman” quickly. (The judges at the recent competition were apparently not so familiar.) Scott Aaronson has provided a transcript of his own interaction with “Eugene Goostman”, which shows the telltale signs as soon as the computer’s first reply, which blatantly ignores an utterly trivial question and tries to change the subject:
Scott: Which is bigger, a shoebox or Mount Everest?
Eugene: I can’t make a choice right now. I should think it out later. And I forgot to ask you where you are from…
Here’s an appeal to cute, vague language:
Scott: How many legs does a camel have?
Eugene: Something between 2 and 4. Maybe, three? :-))) By the way, I still don’t know your specialty – or, possibly, I’ve missed it?
And here’s the repetition of a canned response:
Scott: No, I need to know that you’re not a chatbot. Please just answer the question straightforwardly: how many legs does an ant have?
Eugene: Something between 2 and 4. Maybe, three? :-))) Oh, what a fruitful conversation;-)
Here’s a canned keyword-triggered response, nonresponsive as usual:
Scott: In your opinion, does your existence demonstrate any idea or principle that wasn’t demonstrated just as convincingly by ELIZA, Joseph Weizenbaum’s chatbot from the 1960s?
Eugene: Eliza was a break-thru. All the bots after it were nothing but weak parodies, claiming to have “revolutionary improvements”.
And here’s “Eliza”‘s trick of rewriting the judge’s utterance after pronoun replacement and (poorly executed) synonym substitution:
Scott: OK, I’ll count that as the second sensible thing you’ve said (grading generously). Don’t you think that a more convincing chatbot could be created, by using Google to trawl the entire web for plausible-sounding answers (analogous to what IBM’s Watson did)?
Eugene: Yes, I do think that a more convincing chatbot can be built by using google to trawl the all web for plausible – sounding – I hope I have right to have my own opinion. Maybe, let’s talk about something else? What would you like to discuss?
Literally every one of “Eugene”‘s responses reflects its “Eliza”-like programming. It would be amusing, if it weren’t so predictable.
In summary, “Eugene Goostman” is not qualitatively superior to other chatbots, and certainly has not passed a true Turing Test. It isn’t even close.
- In a parody of this approach, the late John McCarthy, professor of computer science at Stanford University and inventor of the term “artifical intelligence”, wrote a letter to the editor responding to a publication about an “Eliza”-like program that claimed to emulate a paranoid psychiatric patient. He presented his own experiments that I described in my Turing Test book: “He had designed an even better program, which passed the same test. His also had the virtue of being a very inexpensive program, in these times of tight money. In fact you didn’t even need a computer for it. All you needed was an electric typewriter. His program modeled infantile autism. And the transcripts – you type in your questions, and the thing just sits there and hums – cannot be distinguished by experts from transcripts of real conversations with infantile autistic patients.”↩