Why Eugene Goostman Did Not Pass the Turing Test

Shah and Warwick's mistaken claim that Eugene Goostman passed Turing's test rests on their having confused his prediction about the year 2000 for a statement of what counts as passing the test. Turing's own careful specification of what actually does count as passing his test simply gets ignored...

Alan Turing's iconic test for computer intelligence involves three players: two human beings and the candidate computer. One of the humans is the examiner or judge, and the other--the 'foil'--serves as a point of comparison. The judge has to try and figure out which of the other two participants is which, human or non-human, simply by chatting with them via a keyboard. The foil's job is to help the judge make the right identification.

A number of chat sessions are run using different judges and different foils, and if the judges are mistaken often enough about which contestant is which, the computer has passed the test.

No questions are barred--the computer must be able to deal fair and square with anything the judge throws at it. But in order to avoid loading the dice against the computer, Turing stipulated that the judges 'should not be expert about machines'.

He also said that the computer is allowed to use 'all sorts of tricks' to bring about a wrong identification. Smart moves for the computer are to reply 'No' to 'Are you a computer?', and to follow a request to multiply one huge number by another with a long pause and an incorrect answer--but a plausibly incorrect answer, not simply a random number.

Turing pointed out that the judges can use keyboard chat to probe the computer's skill in almost all fields of human endeavour. His examples included mathematics, chess, poetry, and flirting. As to whether passing the test is sufficient to prove the computer can think, Turing acknowledged that Can machines pass the test? is 'not the same as' the question Do machines think?, but said that the first question nevertheless 'seems near enough' to the second, and 'raises much the same difficulties'.

The Turing Test is extremely tough for a computer to pass. We humans find idle chit-chat across the dinner table easy, but if you step back and consider what is involved you realize that, even in quite trivial conversations, we are manipulating vast amounts of knowledge, and are effortlessly producing and comprehending complex linguistic structures, often highly idiomatic ones, as well as expertly handling such obstacles to comprehension as irony, metaphor, creative humour, malformed or unfinished sentences, and abrupt and unannounced changes of subject. None of these, with the possible exception of the first, are things that today's computers excel at.

Turing knew his test was ultra-tough--this was the point of it--and he did not expect a computer to pass any time soon.

That's why it came as such a surprise when two researchers, Huma Shah and Kevin Warwick, announced to the press that a program passed the Turing Test last weekend, in an experiment they ran at the Royal Society, Britain's premier organization for the advancement of science. http://www.reading.ac.uk/news-and-events/releases/PR583836.aspx The program, named Eugene Goostman, is described as simulating a 13-year-old boy's conversation.

Warwick and Shah have stated that, in order for a computer to pass the Turing Test, the judges must mistake the computer for the human more than 30% of the time, during a series of 5 minute conversations. They reported that Eugene Goostman actually exceeded this score in their experiment, managing to convince 33% of the judges that it was human.

In 1950, Turing made a famous prediction about the rate of progress he expected toward passing his test. He said that by the year 2000, a program would do well enough in the test to fool 30% of the judges when the conversations were limited to 5 minutes. He also said that it will be at least 2050 before a program is actually able to pass the test. Obviously, then, Turing did not consider that fooling 30% of the judges during a series of 5 minute conversations should count as passing the test.

Shah and Warwick's mistaken claim that Eugene Goostman passed Turing's test rests on their having confused his prediction about the year 2000 for a statement of what counts as passing the test. Turing's own careful specification of what actually does count as passing his test simply gets ignored.

It is interesting to see Turing's prediction coming true--we can surely forgive him for being just a few years out. But, as Turing thought, the actual test may not be cracked for many years yet.

Close

What's Hot