Awesomeness: Millions Of Public Domain Images Being Put Online
from the go-use-them dept
Here’s some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He’s been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.
To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
Already over 2.6 million images have been posted to Flickr in this manner — all completely in the public domain. From a historical perspective, the images are fascinating — and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:

Filed Under: book scans, copyright, flickr, internet archive, kalev, leetaru, old books, public domain
Comments on “Awesomeness: Millions Of Public Domain Images Being Put Online”
Come one, come all, and place your bets!
While awesome for archival purposes if nothing else, I give it a week at most before some bot starts tagging and demanding pictures be removed and claiming that at least some of them are still under copyright, followed shortly thereafter(assuming Flickr doesn’t just pull them immediately), by the ones running the bot doubling down and insisting that yes, they do indeed own the rights to the images, and will be filing a lawsuit if they aren’t taken down immediately.
Because when there’s absolutely no penalty for copyfraud, well, why not try to claim everything you can, on the off chance that at least some of the claims will stick and/or the target will pay up?
Anyone know a good way...
…to download the originals en masse? This archive seems too important to entrust to Flickr.
Re: Anyone know a good way...
I don’t know why they can’t be added back to archive.org too.
Haha, I Stumbled upon some of these last night while browsing flickr looking for some PD images to use for a project. Re-did my search today, and yep, it was them.
Oh, boy. Whatever’s not going to like this, not one bit.
Re: Re:
IP extremists will try to argue that this is going to kill art and make all artists starve or something.
Thanks for the Info and story.
In jpeg format?
Once more, just to help me mentally process this:
Retrieving and publishing public domain images in jpeg format?
Way Cool!
This is way cool! This guy needs to patent this technique.
Lather, rinse, repeat
It’s interesting that Leetaru has taken on images. He is a major force behind GDELT, the Global Database of Events, Language, and Tone which uses automated techniques to mine news sources for event summaries (among other things).
Unlike GDELT, here all the source material is demonstrably public domain, so publishing the image extracts (in whatever form) should not cause any hiccoughs.
Unfortunately there seems to be something strange going on on Flickr. I cannot just right click on the images and save them like I am used to.
Would be nice if the pictures were uploaded somewhere where they are more easily accessible.
Re: Re:
There’s a little hard to hit three dot icon leading to various sizes that includes original that I can download just fine on an iPhone browser. I just have to zoom in to not miss the dots button since the next image hot area is the whole right edge of the image right down to that button, irritatingly.
Re: saving images
I was able to isolate and save by playing around with the “all sizes” option on flickr. Once the image was displayed without the caption information I was able to use the “save image” option. This was on my iPad. I was able to save a single image to “my photos” on iPad.
Torrent?
Would be cool if someone could create a .torrent file of all the images.
Re: Torrent?
Free, distributed backup plan. Hell yeah!