Stories filed under: "leetaru"

Awesomeness: Millions Of Public Domain Images Being Put Online

from the go-use-them dept

Fri, Aug 29th 2014 05:57pm - Mike Masnick

Here’s some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He’s been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.

To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.

As part of the process, the software recognised which parts of a page were pictures in order to discard them.

Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.

Already over 2.6 million images have been posted to Flickr in this manner — all completely in the public domain. From a historical perspective, the images are fascinating — and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:

There seem to be lots of images of musical scores, sewing machines, individual portraits, building and machinery. Each Flickr page associated with the image gives information about the book, including the text before and after the image, which is pretty cool. The one (only slightly) annoying thing is that on the Flickr pages, rather than saying these are public domain images, it says that there are “no known copyright restrictions.” While that’s accurate, and a potentially reasonable hedge against some miraculous finding that says these images are covered by copyright, it’s really too bad that it’s so problematic to come out and say “this is in the public domain, do whatever the hell you want with it.”

Filed Under: book scans, copyright, flickr, internet archive, kalev, leetaru, old books, public domain

15 Comments

Expand

Follow Techdirt

Subscribe to Our Newsletter

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

Read All »

Trending Posts

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Older Stuff

Thursday
11:03	All But 3 Of The 4,499 Refugees Admitted To The US Under Trump Are White South Africans (20)
10:58	Daily Deal: The Ultimate Python & Artificial Intelligence Certification Bundle (0)
09:29	Oh Look, The MAGA FTC Built The Censorship Industrial Complex It Was Screaming About (22)
05:27	The Wall Street Journal Wonders Why There Are Suddenly So Many Sleazy Fees (23)
Wednesday
19:56	Nintendo's Haphazard 'Mario Maker 2' Takedown Process Rife With Abuse (12)
15:16	War As A Pretext: Gulf States Are Tightening The Screws On Speech—Again (4)
13:15	ACAB: Cops Are Bringing 'Delinquency Of A Minor' Charges Against Adults Who Assist Students During Anti-ICE Protests (21)
11:08	Judge Tosses Trump's Ridiculous $10 Billion Defamation Suit Against Rupert Murdoch (17)
11:03	Daily Deal: Geekey Multi-Tool (1)
09:30	Administration Apparently Planning To Blow Off FISA Court's Ordered Fixes For Section 702 (3)
05:25	'Trump Phone' Sees Price Hike, But Still No Release Date (Or Actual Phone) (32)
Tuesday
20:10	The CDC Doesn't Want You To See A CDC Report On How Effective COVID Vaccines Are (35)
15:34	John Deere Pays $99 Million To Settle 'Right To Repair' Class Action (13)
13:30	Techdirt Podcast Episode 450: Infrastructure For The New Private Internet (0)
11:09	438 Experts Said Age Verification Is Dangerous. Legislators Are Moving Forward With It Anyway. (37)
11:04	Daily Deal: The 2026 Complete Godot Stack Development Bundle (0)
09:26	Trump Invites More Criminal Acts By Promising Pardons To Everyone Who Works For Him (28)
05:31	1,000+ Hollywood Insiders Write Letter Opposing Paramount/Warner Bros Merger (16)
Monday
20:12	Oh God: RFK Jr. Unveils Plan To Be First Sitting Cabinet Secretary To Host A Podcast (18)
15:12	The FAA’s “Temporary” Flight Restriction For Drones Is A Blatant Attempt To Criminalize Filming ICE (14)
13:05	DOJ Is Using A Grand Jury To Force Reddit To Unmask An Anonymous User (13)
11:05	Section 230 Is Dying By A Thousand Workarounds, And Massachusetts Just Added Another One (69)
11:00	Daily Deal: uTalk Language Education (0)
09:28	Nevada Court Latest To Say Mandatory Detention Of Migrants Is Illegal (11)
05:24	Whoops: Russia's Attempt To Block VPNs Causes Major Banking Failure (5)
Sunday
12:00	Funniest/Most Insightful Comments Of The Week At Techdirt (3)
Saturday
12:00	Game Jam Winner Spotlight: As I Lay Flying (1)
Friday
19:39	NVIDIA's DLSS 5 Demo Video Briefly Taken Down Because YouTube's Take Down Process Sucks (20)
15:07	Trump's Two-Faced AI Policy (6)
13:03	Trump Threatens CNN For Very Basic Reporting On His Shitty, Unpopular War (24)

Awesomeness: Millions Of Public Domain Images Being Put Online

from the go-use-them dept

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Thursday

Wednesday

Tuesday

Monday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

from the go-use-them dept

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Email This Story

Tools & Services

Company

Contact

More