reCAPTCHA
October 2nd, 2007
So, some bright sparks at Carnegie Mellon found a way to modify the CAPTCHA anti blog-spam system to aid in the digitization of old books. They call it reCAPTCHA. OCR scanning of old books is somewhat reliable, but a lot of individual words can be misread. These guys distribute the words that are obviously misread to CAPTCHA interfaces all over the place, to have them verified. It should be working on this blog right now.
Take a look at the comment interface underneath this post. Go ahead, scroll down and come back. I’ll wait.
There are two words down there. One is a control word, that is actually a traditional captcha. That gets checked against the known correct answer. The other is an OCR-error word. If multiple people solve the OCR-error word the same way, it gets accepted as official, and the sum total of preserved human knowledge has been increased, slightly. It also comes in wordpress plugin flavor.
So, I’ll be trying this system out for a while. Right now I’m using a moderation queue to vet new posters, allowing unlimited unmoderated posts from vetted posters. If reCAPTCHA works well enough, I might disable the moderation queue.
Entry Filed under: Uncategorized
3 Comments Add your own
1. Sarah | October 2nd, 2007 at 2:17 pm
Just to try this thing out…
Hi Dave.
2. Jesse | October 2nd, 2007 at 3:17 pm
preservedHumanKnowledge++;
Definitely gonna install this on my own blog (that I never really bother to use) at some point.
3. jimu | October 7th, 2007 at 4:45 pm
this is so cool I helped a bit!!
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed