Making CAPTCHAs Productive

from the now-there's-a-good-idea dept

About five years ago, Louis von Ahn was the PhD. student who came up with the idea for CAPTCHAs, the little requests to "type this" before you could fill out a form or sign up for a service. These days, of course, such CAPTCHAs have become nearly ubiquitous. Since then, Ahn has gone on to create other online systems that figured out ways to shift labor resources to users, such as the ESP Game, which is designed to make image search much more effective (and which Google eventually licensed). However, it seems that Ahn has switched his attention back to CAPTCHAs after recognizing what a productivity drain they must be. The nice thing about the ESP Game is the end result benefits image search. CAPTCHAs only help weed out spammers and scammers. However, John writes in to let us know that Ahn's latest work is about making CAPTCHAs useful. What he's done is made it so the text that users have to type are scans from books or other printed materials that are being scanned by Brewster Kahle's Internet Archive project. That way, each time people are simply trying to enter a comment on a website, they're also helping to turn a scanned word into text for the Internet Archive. Of course, if someone were really sneaky, they would just do the same sort of thing and hook it up to Amazon's Mechanical Turk and keep all the earnings. Every time someone entered a comment on a site, it would earn you money. So, if anyone wants to do this, please reserve a cut for me.


Reader Comments (rss)

(Flattened / Threaded)

  1.  
    identicon
    scottbp, May 24th, 2007 @ 8:49pm

    A nice idea but...

    The whole idea of using CAPTCHAs to
    shift labor resources to users
    seems to me to be a crazy UI design decision.

    Personally when I am designing sites and applications I try to increase usability and decrease cognitive load for the user. This seems backward to me. It also seems counter productive to put more barriers in front of a user right when we are already asking them to search their memories for sign in, or registration type information.

    The ESP game works because it is a game, and produces useful info as a by product. This new CAPTCHA scheme takes something that we should already doing on the application side (identifying robots) and then makes it even more complicated. I think we should be decreasing use of tools like CAPTCHAs not adding more complexity to the system.

    Of course I think this idea is quite ingenious, i just don't want to use it on any of the sites I design.

     

    reply to this | link to this | view in thread ]

  2.  
    identicon
    ChurchHatesTucker, May 24th, 2007 @ 9:12pm

    OK, I may be just slow, but...

    Isn't the whole point of a captcha that the text is (a) known, and (b) hard for an OCR program to decipher?

    If (a) is true, what's the point?

    If (b) is true, how is the challenging system going to know if the response is correct?

     

    reply to this | link to this | view in thread ]

  3.  
    identicon
    Ajax 4Hire, May 24th, 2007 @ 10:04pm

    I had forgotten about the CAPTCHA,

    thanks for the article to remind me the name.

    In the early 21st century these were used to help site owners distinguish between human and machine. The only problem was that graphics engines, facial recognition and increased computing Zs (archaic term for MHz/horsepower) allowed for good sometimes even better simple character recognition than the human.

    Consider the problem of "recognizing" 1(one) and l(ell);
    Upper case letter entered in as lower case, zer0 and Oh.

    CAPTCHA was transcended by similar techniques that required turing style test to gestalt the GIF/JPG/MPG/264.

    Next used were images/pictures with
    a question of "what is this?" answer: flower
    Moving beyond the text based recognition to simple images.

    But the CAPTCHA was a minor irritation, the image recognition was more frustrating, multiple valid answers (like the zer0/Oh) caused more ire directed at the site.

    A short lived attempt was tried to use near current event questions similar to the World War II "Who won the World Series last year?" questions. A query that only a human or someone on your side would know.

    CAPTCHA and Image queries were followed by secondary email authentication; a user must provide an email address and respond to THAT. This also proved to be relatively easy to overcome as machine generated email and email filtering/recognition was advanced enough to parse the query and provide the appropriate response.

    There were also some short lived attempts to valid thru the exchange of fractional currency (Microsoft, eBay/Paypal, Oracle all tried Bank/CreditCard/Currency based checks on the assumption that only a human was too stupid to give up access to a currency exchange account).

    By the early teens (2017 uwantwat.com is probably the best early example), sites became indistinguishable from human response in turing test. In fact, the best false positive test (machine passes as human) was summed up in the statement:
    "to human is to err."

    Turing test started using statistical expectation of a slightly wrong answer. but again the basic problem is a
    machine is trying to authenticate real human response.
    Given sufficient access to the machine, you can craft a complement machine to give the expected response.

    Read your history books, its all in there.

     

    reply to this | link to this | view in thread ]

  4.  
    identicon
    Anonymous Coward, May 25th, 2007 @ 5:28am

    "If (b) is true, how is the challenging system going to know if the response is correct?"

    I totally agree!

     

    reply to this | link to this | view in thread ]

  5.  
    identicon
    JBB, May 25th, 2007 @ 7:01am

    Knowing if the response is correct...

    The system uses two words in the CAPTCHA. The first is a known word. The second is one the OCR didn't recognize. If the first one is entered correctly, the system knows you're a human. It then records the second one and compares that answer with other people's answer and if enough agree it decides that's the unOCRable word.

     

    reply to this | link to this | view in thread ]

  6.  
    identicon
    Matt, May 25th, 2007 @ 10:46am

    Re: OK, I may be just slow, but...

    that's what i was about to say...

    CAPTCHA is meant to match up text (user input) to text in an image, meaning that the text is already known.

     

    reply to this | link to this | view in thread ]

  7.  
    identicon
    Jim Schrempp, May 25th, 2007 @ 3:30pm

    ESP game is routinely hacked

    A while back I played the Google implementation of The ESP Game and found that it was being hacked. I believe robots meet in it and pollute the results. I documented it all at:

    http://www.jimschrempp.com/features/computer/googleimagelabeler.htm

    I'd enjoy hearing other's opinions.

    Jim

     

    reply to this | link to this | view in thread ]

  8.  
    identicon
    karry, Dec 1st, 2008 @ 4:04am

    Re: ESP game is routinely hacked

    If the first one is entered correctly, the system knows you're a human. It then records the second one and compares that answer with other people's answer and if enough agree it decides that's the unOCRable word. from laptop battery

     

    reply to this | link to this | view in thread ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
A word from our sponsors...
Essential Reading
Techdirt Reading List
Techdirt Insider Chat
A word from our sponsors...
Recent Stories
A word from our sponsors...

Close

Email This