Awesomeness: Millions Of Public Domain Images Being Put Online

from the go-use-them dept

Here's some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He's been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.
To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.

As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru's code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
Already over 2.6 million images have been posted to Flickr in this manner -- all completely in the public domain. From a historical perspective, the images are fascinating -- and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:

There seem to be lots of images of musical scores, sewing machines, individual portraits, building and machinery. Each Flickr page associated with the image gives information about the book, including the text before and after the image, which is pretty cool. The one (only slightly) annoying thing is that on the Flickr pages, rather than saying these are public domain images, it says that there are "no known copyright restrictions." While that's accurate, and a potentially reasonable hedge against some miraculous finding that says these images are covered by copyright, it's really too bad that it's so problematic to come out and say "this is in the public domain, do whatever the hell you want with it."

Reader Comments (rss)

(Flattened / Threaded)

  1. icon
    That One Guy (profile), Aug 29th, 2014 @ 5:35pm

    Come one, come all, and place your bets!

    While awesome for archival purposes if nothing else, I give it a week at most before some bot starts tagging and demanding pictures be removed and claiming that at least some of them are still under copyright, followed shortly thereafter(assuming Flickr doesn't just pull them immediately), by the ones running the bot doubling down and insisting that yes, they do indeed own the rights to the images, and will be filing a lawsuit if they aren't taken down immediately.

    Because when there's absolutely no penalty for copyfraud, well, why not try to claim everything you can, on the off chance that at least some of the claims will stick and/or the target will pay up?

    reply to this | link to this | view in thread ]

  2. icon
    Toestubber (profile), Aug 29th, 2014 @ 7:28pm

    Anyone know a good way... download the originals en masse? This archive seems too important to entrust to Flickr.

    reply to this | link to this | view in thread ]

  3. identicon
    s7, Aug 29th, 2014 @ 9:36pm

    Haha, I Stumbled upon some of these last night while browsing flickr looking for some PD images to use for a project. Re-did my search today, and yep, it was them.

    reply to this | link to this | view in thread ]

  4. icon
    jupiterkansas (profile), Aug 29th, 2014 @ 10:20pm

    Re: Anyone know a good way...

    I don't know why they can't be added back to too.

    reply to this | link to this | view in thread ]

  5. identicon
    Anonymous Coward, Aug 29th, 2014 @ 11:38pm

    Oh, boy. Whatever's not going to like this, not one bit.

    reply to this | link to this | view in thread ]

  6. identicon
    Anonymous Coward, Aug 30th, 2014 @ 7:02am

    Thanks for the Info and story.

    reply to this | link to this | view in thread ]

  7. icon
    orbitalinsertion (profile), Aug 30th, 2014 @ 7:53am

    In jpeg format?

    Once more, just to help me mentally process this:
    Retrieving and publishing public domain images in jpeg format?

    reply to this | link to this | view in thread ]

  8. icon
    1st Dread Pirate Roberts (profile), Aug 30th, 2014 @ 12:27pm

    Way Cool!

    This is way cool! This guy needs to patent this technique.

    reply to this | link to this | view in thread ]

  9. identicon
    bob, Aug 30th, 2014 @ 2:54pm

    Lather, rinse, repeat

    It's interesting that Leetaru has taken on images. He is a major force behind GDELT, the Global Database of Events, Language, and Tone which uses automated techniques to mine news sources for event summaries (among other things).

    Unlike GDELT, here all the source material is demonstrably public domain, so publishing the image extracts (in whatever form) should not cause any hiccoughs.

    reply to this | link to this | view in thread ]

  10. icon
    Antsan (profile), Aug 31st, 2014 @ 3:49am

    Unfortunately there seems to be something strange going on on Flickr. I cannot just right click on the images and save them like I am used to.
    Would be nice if the pictures were uploaded somewhere where they are more easily accessible.

    reply to this | link to this | view in thread ]

  11. identicon
    Anonymous Coward, Aug 31st, 2014 @ 9:06am


    IP extremists will try to argue that this is going to kill art and make all artists starve or something.

    reply to this | link to this | view in thread ]

  12. identicon
    Anonymous Coward, Aug 31st, 2014 @ 9:12am


    Would be cool if someone could create a .torrent file of all the images.

    reply to this | link to this | view in thread ]

  13. icon
    Ninja (profile), Sep 1st, 2014 @ 7:54am

    Re: Torrent?

    Free, distributed backup plan. Hell yeah!

    reply to this | link to this | view in thread ]

  14. identicon
    NikFromNYC, Sep 2nd, 2014 @ 3:54pm


    There's a little hard to hit three dot icon leading to various sizes that includes original that I can download just fine on an iPhone browser. I just have to zoom in to not miss the dots button since the next image hot area is the whole right edge of the image right down to that button, irritatingly.

    reply to this | link to this | view in thread ]

  15. identicon
    Victoria Love, Sep 9th, 2014 @ 5:06pm

    Re: saving images

    I was able to isolate and save by playing around with the "all sizes" option on flickr. Once the image was displayed without the caption information I was able to use the "save image" option. This was on my iPad. I was able to save a single image to "my photos" on iPad.

    reply to this | link to this | view in thread ]

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
Insider Shop - Show Your Support!

Hide this ad »
Essential Reading
Techdirt Deals
Techdirt Insider Chat
Techdirt Reading List
Hide this ad »
Recent Stories
Hide this ad »


Email This

This feature is only available to registered users. Register or sign in to use it.