Captcha sites digitizing books

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • picklemonkey
    Double hoodie beer monster
    • Jun 2004
    • 15373

    Captcha sites digitizing books

    Many of you may have already known this, but I thought it was really cool. Did you know that Captchas are being used to digitize books?




    Here are some excerpts:
    Today it has become the principal method used by Google to authenticate text in Google Books, its vast project to digitize and disseminate rare and out-of-print texts on the Internet.

    Digitization is normally a three-stage process: create a photographic image of the text, also known as a bitmap; encode the text in a compact, easily handled and searchable form using optical character recognition software, commonly called O.C.R.; and, finally, correct the mistakes.

    Today’s technology makes the first two steps relatively straightforward. The third, however, can be extremely difficult. For vintage 19th-century texts in English, O.C.R. programs mess up or miss 10 percent to 30 percent of the words. Only humans can fix the errors.

    Dr. von Ahn’s group estimated that humans around the world decode at least 200 million Captchas per day, at 10 seconds per Captcha. This works out to about 500,000 hours per day — a lot of applied brainpower being spent on what Dr. von Ahn regards as a fundamentally mindless exercise.

    each suspicious word is turned into a Captcha. It is crucial to understand that the Captcha is a distorted version of the word as printed in the original photographic image. It is not made from the O.C.R.’s imagined translation, which is often unintelligible. The unknown word is then paired with a second Captcha word whose correct translation is already known. This is the “control.”

    With all these constraints, reCaptcha nevertheless achieves an accuracy rate above 99 percent, which compares favorably with professional human transcribers.

    So... proprs to each of you who has been downloading music from file sharing sites without having a premium account. You're digitizing books at a rate of 57 years per day
  • 88Mariner
    My dick is smaller
    • Nov 2006
    • 7128

    #2
    Re: Captcha sites digitizing books

    man lands on moon.
    you could put an Emfire release on for 2 minutes and you would be a sleep before it finishes - Chunky

    it's RA. they'd blow their load all over some stupid 20 minute loop of a snare if it had a quirky flange setting. - Tiddles

    Am I somewhere....in the corners of your mind....

    ----PEACE-----

    Comment

    • feather
      Shanghai ooompa loompa
      • Jul 2004
      • 20894

      #3
      Re: Captcha sites digitizing books

      Pretty cool of Google, but these Captchas make me blind!

      I believe they also did this thing with image search that got people to tag pictures or 'recognise' pictures. So they turned it into a game to entice people into training their algorithms.

      i_want_to_have_sex_with_electronic_music

      Originally posted by Hoff
      a powerful and insane mothership that occasionally comes commanded by the real ones .. then suck us and makes us appear in the most magical of all lands
      Originally posted by m1sT3rL
      Oh. My. God. James absolutely obliterated the island tonight. The last time there was so much destruction, Obi Wan Kenobi had to take a seat on the Falcon after the Death Star said "hi and bye" to Leia's homeworld.

      I got pics and video. But I will upload them in the morning. I need to smoke this nice phat joint and just close my eyes and replay the amazingness in my head.

      Comment

      • tiddles
        Encryption, Jr.
        • Jun 2004
        • 6861

        #4
        Re: Captcha sites digitizing books

        I use these two for all my bots. I'm sure there's a way to integrate with firefox or something if you're sick of them:

        deathbycaptcha.com
        decaptcher.com

        also this is still kinda accurate:

        Comment

        Working...