Awesomeness: Millions Of Public Domain Images Being Put Online

from the go-use-them dept

Here’s some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He’s been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.

To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.

As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.

Already over 2.6 million images have been posted to Flickr in this manner — all completely in the public domain. From a historical perspective, the images are fascinating — and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:

There seem to be lots of images of musical scores, sewing machines, individual portraits, building and machinery. Each Flickr page associated with the image gives information about the book, including the text before and after the image, which is pretty cool. The one (only slightly) annoying thing is that on the Flickr pages, rather than saying these are public domain images, it says that there are “no known copyright restrictions.” While that’s accurate, and a potentially reasonable hedge against some miraculous finding that says these images are covered by copyright, it’s really too bad that it’s so problematic to come out and say “this is in the public domain, do whatever the hell you want with it.”

Filed Under: , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Awesomeness: Millions Of Public Domain Images Being Put Online”

Subscribe: RSS Leave a comment
15 Comments
That One Guy (profile) says:

Come one, come all, and place your bets!

While awesome for archival purposes if nothing else, I give it a week at most before some bot starts tagging and demanding pictures be removed and claiming that at least some of them are still under copyright, followed shortly thereafter(assuming Flickr doesn’t just pull them immediately), by the ones running the bot doubling down and insisting that yes, they do indeed own the rights to the images, and will be filing a lawsuit if they aren’t taken down immediately.

Because when there’s absolutely no penalty for copyfraud, well, why not try to claim everything you can, on the off chance that at least some of the claims will stick and/or the target will pay up?

bob says:

Lather, rinse, repeat

It’s interesting that Leetaru has taken on images. He is a major force behind GDELT, the Global Database of Events, Language, and Tone which uses automated techniques to mine news sources for event summaries (among other things).

Unlike GDELT, here all the source material is demonstrably public domain, so publishing the image extracts (in whatever form) should not cause any hiccoughs.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...