reCAPTCHA: a Captcha that Helps Digitize Books
You’ve probably encountered a captcha by now: a form on the Web that presents an image and asks you to type some text from the image into a blank or asks you a question about the image. You might encounter them before writing a comment on someone’s weblog, sending an e-mail, or partaking in a similar activity. Check out this new kind of captcha from Carnegie Mellon’s School of Computer Science: reCAPTCHA. They even have a special version to protect e-mail addresses. It takes words computers can’t read from digitized works–currently material from the Internet Archive–and cycles it through various captchas to let people do the interpretive work. If enough people agree on the same scanned text, the computers believe their interpretation of the text is probably correct. The developers estimate people solve about 60 million captchas a day. It might take an average person about ten seconds to solve a captcha. The amount of digitized text that needs proofreading is quite large, but distributing the work in this fashion makes it considerably more efficient while adding a layer of security to a number of Web sites through these completely automated public Turing tests to tell computers and humans apart.
Chief software architect Ben Maurer writes about it on his weblog.




