Dear Internet, We Need Better Image Archives

from the The-Public-Domain-should-be-Public dept

Cross-posted from

Dear Internet,

You know what should be really easy to find online? Good quality, Public Domain vintage illustrations. You know, things like this:

Hats / chapéus

I found this on Flickr, where someone claims full copyright on it. That’s copyfraud, but understandable because Flickr’s default license is full copyright (all the more reason to ignore copyright notices!). But copyfraud isn’t not the main problem. The main problem is that images like this are painfully difficult to find online, especially at high resolutions (and this image is only available at medium resolution – up to 604 pixels high, which is barely usable for most purposes but higher than much of what you find online).

The images are out there – and with zillions of antique books being scanned, their vintage illustrations are being scanned right along with them. But the images are buried in the text, and often the scan quality is poor. Images should be scanned at high quality, and tagged for searchability.

Are archives ignoring the value of images?

Take the American Memory archive of the Library of Congress. Lots and lots of historical documents here, but no way for me to find an image of, say, a horse.

Most bookscanning projects focus on texts, not illustrations. Many interesting and useful illustrations are buried within these scans, uncatalogued and inaccessible. Scan quality is set for text, not illustrations, so even if one can find a choice illustration buried within, its quality is usually too low to use. is great (I love you,!) but does not have an image archive. Still images are not among their “Media Types” (which consist of Moving Images, Texts, Audio, Software, and Education). So I went spelunking through their texts, starting with “American Libraries,” and searched for something easy: “horse.” Surely I could find a nice usable etching of a horse in there somewhere. I eventually found “The Harness Horse” by Sir Walter Gilbey, from 1898.

Nice illustrations! Can I use them? Unfortunately, no. The book is downloadable as PDF and various e-publication formats, but when I try to extract the illustrations, I get a mess (which you can see, after the jump):

Copied and pasted from Adobe Acrobat. WTF?

The same image, inverted. Doesn't work.

"Save Image as..." from Acobat. This worked, except where it didn't: part of the image is simply missing.

Clearly something is messed up here. Was it just that page? Alas, no:

This sad image from another page has the same problem.

The scans have some flaws that PDFs and Photoshop can't cope with:

Screen grab of zoomed-in view from Acrobat. What looks like a blur in the PDF renders the image unusable when extracted.

These images are not usable, which is a pity because they are very nice illustrations. And they seem to be among the higher quality scans, which again isn't saying much.

Let me add that it's great these books are being scanned at all! That's definitely better than losing them entirely. But as an artist, it saddens me that we're neglecting this wealth of visual art. I'd like to see our rich visual history properly archived. Our bias favoring text over pictures is especially ironic considering how much more efficiently information is communicated to humans through images; "A picture is worth a thousand words," or more. That's why I'm a cartoonist, after all.

I was able to extract one clean image from the book, on page 48:

Unfortunately I can't use this illustration for my purposes, but maybe someone else can. I've already gone through the trouble of finding it in a text, extracting it, and rotating it. If only there were some image archive I could upload it to at high resolution, so someone else could use it. I could tag it, to make it easier to find. I could include all kinds of useful metadata, like what book it was from and when it was published; but even if that was too bothersome, I could at least include tags like "horse," "rider" and "engraving." Wouldn't it be nice if such an archive existed? Wikimedia Commons is close, although I dread uploading things there after having all my open-licensed comics deleted by an overzealous editor. But maybe they're our best hope.

Continuing my searches on, I found this ostensibly Public Domain, vintage horse book with line illustrations. Unfortunately this is controlled by Google Books. It's "free" to read online in Google's reader, which doesn't allow any image export. It also doesn't allow me to zoom in.

All those illustrations, trapped at low resolution, unusable (even if they were tagged/catalogued, which they aren't). This is our "Public Domain." Who exactly is benefiting from having these 18th Century illustrations inaccessible to today's artists?

Then there's Dover Books. I loved Dover books growing up - they introduced me to the idea of the Public Domain. Dover reproduces vintage illustrations in books for artists and designers. Their paper books were reasonably priced, and you could use the illustrations for anything, without restriction. Browsing was free, so I would flip through the pages in the book store, and if it had what I needed, I'd buy it.

Dover is still selling books, but the prices are now relatively high, few are carried in bookstores, and they prohibit browsing online. You have to shell out $15 to find out if what you need is in the book, and how could you know? They seem to be clinging to an outdated copyright model, and rather than selling things of added value, they are simply blocking access to existing Public Domain works, in order to collect a toll.

What else has kept a good public archive of Public Domain images from existing? Some artists and archivists do make high quality scans of vintage illustrations - and keep them to themselves. I guess we could call this "image hoarding." I assume the reasoning is, "I went through all the trouble to scan it, why should I share? Others can pay me if they want a copy." Also there's the "finders, keepers" reasoning: "anyone else is free to find the same illustration in another antique book, but I found this one, so it's mine." And so these images remain inaccessible, not part of any public archive.

Wikimedia Commons is the best public image archive I know of right now. A bit of searching led me to their "Engravings of Horses" category, which yielded some nice images. Unfortunately, many of these are not available at sufficiently high resolutions.

File:Fotothek df tg 0005647 Nutztierhaltung ^ Tiermedizin ^ Pferd ^ Krankheit.jpg

The maximum size of this image is 800 × 608 pixels, which limits its use. Limited image sizes and limited selection have been the biggest obstacles to my relying more on Wikimedia Commons; but it can get better. Maybe it will. It would be nice if something became the public vintage image archive I and so many other artists need.

Filed Under: , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Dear Internet, We Need Better Image Archives”

Subscribe: RSS Leave a comment
Marcus Carab (profile) says:

Re: Re:

I bet if someone could charge monopoly rates for access to high-res images of such public domain works, you wouldn’t have as much trouble finding them.

That’s true, but then they would cease to be public domain, which kind of kills the point of the whole thing.

Still, I do see what you’re saying. But what I don’t understand is why your mind goes straight to selling “access” to non-scare goods – the hardest thing to put a price on in the digital era. It’s like you have a mountain full of gold, and instead of mining it you decide to tax everyone in the countryside for looking at it. Why not focus on selling the scarcities? There are a few that seem obvious right off the bat:

– High-res scanning services
– Manual vectorization services
– Archive/library/research services (“I need you to find me high-res engravings of horses, here are my requirements…”)
– Printing/mounting/framing/canvas-transfer/etc (we are talking about a wealth of artwork just waiting to be tapped)

I bet there are more too – including some pretty clever and disruptive ones. But to figure them out you’d have to put your mind to it, instead of relying on outdated laws in the hopes of barely lifting a finger.

Anonymous Coward says:

Re: Re: Re:

I fail to see how selling any of the services you refer to would result in greater access to high-res copies of public domain works (thought they may very well be a good business for the proprietor).

I’m not trying to make an argument for doing away with the public domain, but I find it amusing that Nina would, without a hint of irony, bemoan how hard it is to get access to good quality copies of things in the public domain.

I think the incentive not just to create, but to distribute, market, publicize, etc. works is one of the most often underappreciated incentives of copyright protection.

Anonymous Coward says:

Re: Re: Re:5 Re:

Well, the tool makers made their part, now is up to people to start using those tools 🙂

Besides if people really want to store images they can find solutions.

Make a high resolution video of still images and upload to, upload to, upload to the other dozen websites that accept PD material, use Flickr and let search engines index those images, use distributed storage, use distributed websites that are hard to kill by any government unless they can remove all copies from all over the world.

Did you saw the size of the list of places where one can find PD material?
2 years ago you could just count them with your fingers now it has grown to more than a hundred places.

You know what that means right?

Anonymous Coward says:

Re: Re: Re: Re:

You don’t want others to know that there are free alternatives to your way of incentivizing things, because people are already doing it in a variety of forms.

– Using social images sharing websites like Flickr.
– Torrents.
– Dedicated PD websites, which there are more than one.
– Creating a movie with still images and uploading to
– Using distributed storage solutions.

The real problem is not that there are no solutions, the problem is that there are no marketing involved so few people know about it.

But with others bringing attention to the issue that soon may change.

Anonymous Coward says:

Re: Re: Re:3 Re:

About the problem, it appears that it will not be for longer, as sources for free images are popping up everywhere.

If you tried to search that just a few years ago you would find just a dozen or so, now it is a list containing ten times what you would have found before.

There is something driving that.

Marcus Carab (profile) says:

Re: Re: Re: Re:

I fail to see how selling any of the services you refer to would result in greater access to high-res copies of public domain works (thought they may very well be a good business for the proprietor).

– Scanning and vectorizing: customers pay to have this done so they can be the first to use something that has never been digitized (an illustration they’ve found in a book, for example), but the work is PD so once the deed is done it increases access for everyone. Thus a core group of people who hunt out source material incrementally increase the wealth of quality PD images by funding digitization on an as-needed basis

– Archive/library/research: if there proves to be a demand for such services (and I’m not saying I’m certain, but it seems likely) then naturally that incentivizes whoever offers those services to constantly increase their database of PD works, in order to make curation all the more valuable.

– Printing: again, if there proves to be a demand for printed reproductions of PD artwork, that incentivizes the creation of larger databases of such artwork and increased access to those databases in order to drive sales

Marcus Carab (profile) says:

Re: Re: Re:3 Re:

Well, actually I meant both scanning and vectorizing as two entirely separate things – as in my original comment 🙂 I do know what you are talking about with vectorized etchings – however I have found that it’s not so bad depending on what level of detail you are talking about, and can still have certain advantages. However, I also wasn’t really limiting my thoughts to this specific type of etching, but public domain artwork in general.

Marcus Carab (profile) says:

Re: Re: Re:3 Re:

btw, has some cool stuff – but it’s a very small collection, not a big archive. Fun though – and a pretty good guage of what we are talking about, because some of the vectorized etchings are indeed too detailed and hard on the computer, while others are quite nice to work with (though of course this depends to some degree on the computer in question, too)

Nina Paley (profile) says:

Re: Re: Re:4 Re:

Thanks for the link to I downloaded this caloric engine illustration. The vector file is 12.6 MB. The high res photoshop file is 11.3 MB. This is a fairly typical etching, and the vector version is larger than the (large) raster. It would be a monster to work with in any of my vector graphics programs; I’d get minutes of the “spinning rainbow” if I tried to edit it. Of course this isn’t true for all images, and I respect vintagevector’s handling of their images – breaking them into smaller chunks, for example, as in the case of these highly detailed border parts.

Ed C. says:

Re: Re: Re: Re:

Wait, I thought the argument for copyright was that without protection, works could be copied with reckless abandon? Actually, if you even tried to think about it for even a moment, you would realize that copyright has nothing to do with the need to distribute, market, publicize, etc; everyone has those cost, regardless of whether you own the copyright or not. The only difference that copyright makes is the need to pay someone else before doing any of the above.

Anonymous Coward says:

Re: Re:

No need. We din’t need monopolies for archivers to catalog evey other kind of public domain work, no reason why we can’t do the same for images.

In face, resources are listed in the comments below that do this very thing.

Yet the maximists will still try to claim that this is a failure of the sharing model. *sigh*

Anonymous Coward says:

Re: Re: Re:

I think sharing PD works is great. I just think that you’re not as likely to get as much effort put forth to such work based on nonprofit goodwill toward man as you would as toward an effort based on cold, hard, greedy, money-grubbing profit.

Hell, you can’t escape popular music. It’s everywhere. I don’t think that’s because if its inherent qualities. I think that’s because people stand to make a lot of money by making such music popular.

Anonymous Coward says:

Re: Re: Re: Re:

Hmmm…I can escape popular music, movies and other media, I live under a rock.

I don’t know who is hot or not, what is hot on TV or in the theaters, I have to go to torrentbuttler to see what is new everytime I need to be snarky about piracy.

About the odds of having people do something about it, well open source just proved that people can put as much effort as multi billion dollar companies can.

Anonymous Coward says:

Re: Re: Re: Re:


Apparently open source is a growth market and they give all their secrets for free and still manage to make money.

Likely someone will see a potential market in PD and fill in the need for services in that area.

Anonymous Coward says:

Re: Re: Re:3 Re:

10 minutes of search, maybe my Google-fu is just better but I don’t see a problem finding anything PD out there.

Finding those is not a problem at all, what is a problem is that they are all over the place, there is no one place that got big and everybody use it to find things yet, but give it time it will happen and then you will see people all over the world sending in a lot of material to it.

Anonymous Coward says:

Re: Re: Re: Re:

Wait… what?

Since when did I talk about popularity?

The thing that matters is that the content is there if you want it and are willing to look, not whether you take the
internet up on that offer or not.

Mass sharing/archiving of PD/open source/CC-BY-(SA) works happens all the time, even though it’s not well advertised on the whole.

Sure I used to ignore free/open source content at first becuase the descriptions or hype was underwhelming, but found myself very happy with such things once I gave it a chance.

The fact I could get what I wanted from freely shared works is a huge success in my book.

bob (profile) says:

Re: Re: Actually there is a reason

Images are one of the old data formats on the web, much older than music or video. If such a repository was going to be created by the good will of people, it would already exist.

But there’s a problem here. Scanning takes work. It’s not like ripping a file or copying it with an automated program. Someone has to pick something up and put it in a box. It’s much harder to share these things.

And don’t be fooled by the vast array of pirated material. There are reasons to believe that companies seed it to increase their revenues from people who pay for access to the pirated material. They’re running stores, they’re just not sharing anything with the artists. Many of these aren’t the grassroots efforts that you would like to believe.

Trust me. Big Piracy is a big business. If they want people to come back each month and paying for access, they’ve got seeders putting up the stuff.

Also don’t be fooled by the existence of open source software. Many of the companies that release software to the open source stacks are doing it for selfish reasons. They share with other programmers because they hope that the other programmers will do some of the work and share the development costs. It often works for some areas.

But don’t think it works for all. There are precious few open source games and it looks like open source productivity software could be heading south now that Sun/Oracle is givin g up on getting anyone to pay.

Face it. The cases when people share successfully are rare. I’m pretty sure that the free scanning services that Mike would like aren’t going to appear any time soon.

Richard (profile) says:

Re: Re: Re:3 Actually there is a reason

Take a look at the change logs. Trust me. The work isn’t getting done any more because the workers aren’t getting paid by Sun’s dream of selling hardware with the free software. There’s a reason why Oracle took one look at this market and ran!

Actually, although Sun did a lot to push Oo in the early days – by the end their behaviour had become a problem. When Oracle took over they made things a whole lot worse and caused the community to split.

It was only after the split that they realised that what they had left themselves with (as a result of their own bad actions) wasn’t worthwhile and bailed out.

Now when a movement has a split like that there will be some fallout and it will take time for things to recover. It follows that your observation is not sufficient to support your conclusions – and hence there is no reason to trust you.

Marius (profile) says:

You might want to try Adobe Acrobat Professional.

It has an export function which would allow you to export each page (as shown on screen) to a PNG, TIFF, JPG or other formats at various DPI values.

For example, you could export it to a 600dpi PNG file, which would probably give you a 4000 by 2000 image or even larger.

I know I’ve used this to export scanned newspaper articles, to import them in OCR software later on (these programs require high dpi images)

It is an expensive piece of software though, other pdf readers may be able to do the same thing but I can’t vouch for any.

Anonymous Coward says:

Google has options to search images by type of licenses and size.,islt:svga&gbv=1&ei=ze6ETtSBN9HtsgavhaHhAQ

For being able to upload and thus archive those findings.

Found it here

There are several websites dedicated to PD content apparently for images and some let users upload images.

But there are also other solutions you could also use Flickr and let search engines do the work like this one.

It indexes only images with liberal licenses that let you use it or PD, at least that is what it says.

out_of_the_blue says:

"I'd like to see our rich visual history properly archived."

You need an archivist. But as I pointed out in one of the JSTOR threads, the advocates of “free” here aren’t willing for librarians or archivists to get any income from their services. They’d rather sneak in and “liberate” the data, oblivious to the efforts of scanning and classifying: JSTOR can beg for contributioms. — You should re-consider your stance on “free”, as you’re simply wishing for people to give their time for your possible convenience.

Not intended as overly personal or emphatic. I’m sure you’re a “good” person, in short. But those services don’t come for free.

My solution to the problem of paying archivists and librarians is gov’t subsidies, and the argument concludes that it’d be far better spent than on killing people in needless wars.

Nina Paley (profile) says:

Re: "I'd like to see our rich visual history properly archived."

the advocates of “free” here aren’t willing for librarians or archivists to get any income from their services.

Actually I’d very much like some of the funding that currently exists for text, audio and motion picture archives to go towards making a PD image library, at least of black and white line illustrations, etchings, engravings and woodcuts. People need to be paid for something like that to work. The Library of Congress pays its staff; most archives have professional staff that are paid. But image archiving isn’t valued the way text archiving is, and so it isn’t funded as well. I assume most funders just don’t think there’s a need for it. I’m pointing out that yes, there is a need. A funded archive could include contributions from unpaid participants as well, but I don’t think a proper image archive is going to happen without some real money.

bob (profile) says:

Re: Re: "I'd like to see our rich visual history properly archived."

I think you’re just being cynical. If the images just manage to Connect with their Fans and give them a real Reason to Buy, the Image Archive is going to be flying high! Don’t be cynical and talk about money. This is the Internet. All of the cool dudes are going to be running to copy this stuff for you because they’re so grateful that they were able to snarf some free MP3s. Yup. That’s how the web rolls all right.

Anonymous Coward says:

Now if people want distributed storage on the cheap to store petabytes of information maybe things like OMEMO are the solution.

It is encrypted and fairly anonymous.

There are other projects that are less known like Osiris Serverless Portal that don’t even depend on DNS on the normal internet.

Microsoft has their own distributed tech too.


Brock Phillimore (profile) says:

I have a family bible that’s over 100 years old. It’s nearly a foot thick with over 1500 pages and as many illustrations. It has both the old and new testaments side by side and huge concordance(index) at the back.

The last copyright I can find in it is from 1890. Could I scan the pages and put them into some public domain web site or does copyright still stop me from doing that?

Anonymous Coward says:

Re: Re:

If you need help scanning here are some cheap solutions: (10 cheap solutions from building a book scanner to smarphone based scanner apps)

Beware that digitizing the book is just part of the process, you need to pass it through a scanning post processing program to fix alignment, vignetting and other issues and probably use an OCR(Optical Character Recognition) software to make it searchable. But just the raw images are mighty fine, I’m sure others can group together to do the rest to get a perfect digital copy off of it, for the post processing part, OCR and proof reading of the OCR output.

Here are 2 places where people help others do it.

bob (profile) says:

Can you say "death of the commons"?

I can.

So Mike, why don’t you step up and run this? Isn’t that how it’s supposed to work? You’ve got a great idea. You’ve got the vision. So jump on it! I’m happy to support you by using it to illustrate my blog and skip paying real artists. I’ll even fill out some review somewhere that says you’re really great.

Oh what? You need some help doing the work. Don’t worry. I’m sure someone’s going to step forward. There are tons of cool artists and they’re all pissed off at the man and those big corporate machines that only give them a small percentage of what their art is worth. They’ll be rushing over to help you because zero is somehow better than a small percentage.

So stick it to Dover and their outdated business model. Show us how it’s done for free. Maybe you can get Silent Bob to tell you how to create a paywall and collect a toll without calling it a paywall or a toll. Yeah. That’s the ticket. Just change the words.

But whatever you do, blame evil copyright for creating this death of the commons. I know that all of the public domain work is free of copyright and it’s going to take a bit of work to actually blame copyright, but I’m sure that somehow you’ll find a way to blame Rightshaven or the RIAA or the artists who somehow want to cash a check from time to time.

Go for it. Then get to work showing us how it’s done.

ChurchHatesTucker (profile) says:

Re: Can you say "death of the commons"?

So stick it to Dover and their outdated business model. Show us how it’s done for free.


I’ve got some Dover CDs. Let’s take the scans, re-title them, and add metadata. The US has no ‘sweat of the brow’ BS, and we’d be making our own compilation, so we should be go to go.

RobShaver (profile) says:

Re: Unfortunately ...

“Unfortunately the GmailFS project has come to an end. libgmail has ceased being maintained by its developers, and as a result libgmail no longer works with the latest Gmail interface (and has not done so for many weeks). Without a working libgmail, GmailFS does not function, so the end of libgmail also spells the end of GmailFS.”

RobShaver (profile) says:

Re: Re: Unfortunately ...

Oh, and “Note that Google’s terms of use prohibit the use of their services by any automated means or any means other than through the interface provided by Google. These restrictions would make use of GmailFS a direct violation of the Service agreement.”

As we know from reading TechDirt, violating any companies terms of service is now a criminal offence … so maybe it’s good that GmailFS doesn’t work any more.

Anonymous Coward says:

seems the complaint is someone else hasnt done the work to make the scans you want, but you cant be bothered to go scan them yourself, quit complaining

dover doesnt have an outdated business model, those people do this thing called ‘work’ to make those PD images useable for others, if they posted them all online so you can “see” them first, people would just copy them and not pay dover for ht etime they invested in making them, but you dont seem to care about that

RobShaver (profile) says:

Re: No, she's been too busy ...

making original art, in high resolution, and putting that into the public domain. Things like the full length animated feature film, “Sita Sings the Blues” ( and the full set of eleven images, “The Avatars of Vishnu” (, in vector format so they have infinite resolution and many many others.

So what have you, Mr. Smug AC, contributed to the Public Domain this year? Huh? What did you say? Nothing! I’m shocked.

Instead, like me, your reading about being creative. (Oh, I’d better get back to my little video editing project.)



Anonymous Coward says:

Re: Re: Re:

I dunno, using a bump map perhaps?

Using a normal map?

Warping textures?

Acquisition of topology information through 3D scanners, for storing historical data like people doing it for Cuneiform tablets.

This last option can be used if you really want to save historical data.

Now I don’t know what she wants the horsie for, it is to learn something by looking at how it was done? it is to recreate that exact same thing for some sort of preservation? or it is to use it as a template for something else?

If it is for use as a template, you don’t need high resolution images to etch anything, you can make one from a 256×256 pixels probably, and create interference patterns on the drawing using modern tools available in Gimp or other image editor, print the result on a laser printer and heat transfer that to any surface you want, using chemical etching techniques get something, and it would be just like etching a PCB, or use sandblasting, or use a 3 axis CNC machine, or care it by hand.

You can create any pattern as detailed as you want using modern image editors.

RobShaver (profile) says:

I had no problem ...

getting a fairly nice high resolution copy of that horse.

I’m not disagreeing with you about the Public Domain … I do think it’s a travesty that nothing goes into the Public Domain any more … in fact stuff it getting removed.

Here’s a little video of the first way I tried (which I created using the free version of Jing):

Next I downloaded the PDF and opened it in Adobe Acrobat Reader v.10.1.1. I went to the page, rotated it 90 degrees and then used Jing again to capture a still image which you can see here:

Next I futzed around using Adobe Reader to make the picture as big as I could on my screen. Then I used Jing to capture this larger image and saved it to my local disk. I opened it in Gimp and found that the pixel size is 1350×9034. So I zoomed into the picture where you had seen much bluring and took another snap with Jing. I think mine looks much sharper than yours. Here’s the link to it.

Of course you are limited to your screen resolution when capturing from your screen. Even 1920×1080 isn’t really high enough for good print design of any size.


p.s. I’ve been enjoying planting your Intellectual Pooperty pamphlets in some strategic place.

Nina Paley (profile) says:

Re: I had no problem ...

Very nice. But since the scans were originally captured as image files, wouldn’t it be sensible if they could be obtained as such, rather than being converted back and forth? Can you imagine going through that for every image, when it’s totally unnecessary?

Fortunately, Rick Prelinger left this comment on my blog:

You can easily download the still images from which Internet Archive PDFs were derived. There should be a link on the left side to ?All Files,? right with the links to the various versions. You will see a menu, and what you want is the .zip of all the .jp2 files. It?s usually a large download, but you will then have each page in much better resolution and quality.

It’s not quite that simple, but close. I replied:

Thanks Rick. The files listed for ?The Harness Horse? are:
(2.8 M)PDF
(2.2 M)B/W PDF
(~72 pg)EPUB
(~72 pg)Kindle
(~72 pg)Daisy
(47.6 K)Full Text
(1.5 M)DjVu

Below that, there is ?All Files: HTTP?. When I clicked that, I got a list of all kinds of things ? and one was indeed .jp2 zip! Now that I know what it is, I can use it. But it?s very hidden! And we still don?t have an image archive, although poring through .jp2 files and cleaning up and tagging images found therein could be a way to contribute to one.

Anonymous Coward says:

Re: Re: I had no problem ...

The first option that you didn’t list was “read online” if you have choose that one you would be able to save a JPG copy, although smaller than the original.

Because is coming from the reduced images in the ZIP and not the original in the TAR.

Anonymous Coward says:

Re: Re: I had no problem ...

High resolution image from pixels x 1934 pixels)

Going to “read online” then clicking on the zoom button inside the page until it doesn’t zoom any more and saving the image.

Rekrul says:

Re: Re: I had no problem ...

Very nice. But since the scans were originally captured as image files, wouldn’t it be sensible if they could be obtained as such, rather than being converted back and forth? Can you imagine going through that for every image, when it’s totally unnecessary?

As I understand it, PDF has its own image compression methods, and all images are converted to one of them when the PDF file is created. I believe that there are both lossy and lossless compression methods. So when the images are extracted from a PDF file, they have to be converted to a normal format anyway. If they were stored in a lossless format and extracted a lossless format, then you get an exact copy of the data. However if they’re extracted to Jpeg format, or were stored in a lossy format, you’ll never be able to get an exact copy of the original file.

Fortunately, Rick Prelinger left this comment on my blog:

That is easier, however it will only work for PDF files obtained from the Internet Archive. Getting a file from anywhere else still leaves you with the problem of how to get the files out of the PDF file.

Nina Paley (profile) says:

artists and techies

More comments than I expected on this article. One thing is clear: TD commenters are not well versed about graphics and how artists use them.

This may explain why there are no really good public image archives online: the leaders of public/open source projects are mostly techies, who (in general) don’t understand images so well. And most visual artists, who do understand images, tend to cling to proprietary models and disdain public archives.

Karl (profile) says:

Re: artists and techies

the leaders of public/open source projects are mostly techies, who (in general) don’t understand images so well.

Nor music, unfortunately, which is why Open Source music software has been about ten years behind the times.

Fortunately, things are getting a lot better, very quickly. I’m sure if you got some actual graphic artists on board with this, things would eventually take off.

Perhaps some kind of SETI@home type deal? It’s kind of what Wikimedia Commons does, but there should be a service that is focused mainly on the images themselves.

Or, perhaps, some sort of incentive for book stores (those that are left) to help out? Scan in the drawings from a PD book, and you can have some sort of “sponsorship” ad on the site, or something.

Anonymous Coward says:

Re: artists and techies

TD commenters are not well versed about graphics and how artists use them.

This may explain why there are no really good public image archives online: the leaders of public/open source projects are mostly techies

The first may be true, but does not explain shit. I bet the largest reason is because there is no money in trying to appease a niche group, who also happens to be ridiculously anal and arrogant about stuff that nobody but the niche group notices or cares about. It seems more likely that

And most visual artists, who do understand images, tend to cling to proprietary models and disdain public archives.

explains that they are far less technical (meaning: cant figure out how to load an image and right click), than the technical people are visual artists.

Anonymous Coward says:

Re: artists and techies

I think this is the heart of the problem. Artists can’t wait for other people to give them what they want. The great thing about the internet is if you see a need for something, you can fill that need yourself. You probably have the clout to start such an image archive and get a group of people going filling it with quality scans.

It would be a vintage clip art collection for artists and designer. I would love to have such an archive for design work too.

I tried the Smithsonian – not much luck with high quality images there either.

Anonymous Coward says:

Re: artists and techies

Nice comments there Nina. Why not just call them fucking morons?

The truth is people who live a leaching life rarely learn the tools to actually make anything for themselves. You have made it all the way up to bad cartoons, which puts you in the top 1 or 2 percent on this site, considering most people here (like the talentless schmuck Marcus Carab) thing that taking something and chanting bad poetry over it is somehow “art”.

Don’t be lazy – if there is a need, make it your life’s work. Give up your time and really give back to the Tardian world. Stop all this other stuff you are wasting you time on, and give back to the community that has so well rewarded you by ignoring all your previous works.

Marcus Carab (profile) says:

Re: Re: Re: artists and techies

In his mind, it is better to not create anything at all than to risk creating something that isn’t 100% “original” or something that a random anonymous weirdo on the internet might (gasp!) make fun of you for.

We should cut him some slack though – when the human fire in your belly is all but extinguished by bile and uncle-sperm, the world must seem like a cruel ironic place: so many people walking around, mocking you, making it look so easy to just be happy and not have dicks in their mouths while you struggle with the mystery of how that is accomplished. It must be a sad little life in his basement, with nothing to keep him company but porn blogs and photos of Mike with the eyes scratched out – he deserves our pity more than anything.

Anonymous Coward says: has the original images for all their books apparently, this one uses JPEG 2000 compression which may be the cause for the fusing on the PDF format.

Using openjpeg to extract the image to TGA I got a file with 30MB in size that is clear to me here

j2k_to_image -i /home/thepirate/Documents/harnesshor00gilb_orig_0006.jp2 -o /home/thepirate/Documents/harnesshor00gilb_orig_0006.tga

If the DJVU came from, there are often high-quality JPG files that are viewable online (go to the details page, and choose “read online”, and from there you can increase the size of the image, then right click and save an image. This is in fact easier than ripping from DJVU, as you don’t have to mess around with a screenshot and trimming the image, and the resulting quality is hugely better.

Anonymous Coward says:

“I assume the reasoning is, “I went through all the trouble to scan it, why should I share? Others can pay me if they want a copy.”

That reasoning is common in the real world, where the rest of us live. If someone takes the trouble to do painstaking work and expects to be compensated, why is that a bad thing? Google scanned a great deal of books and put them up online, because they have the effing money, and stand to make more money from their efforts.

Instead of whining about the lack of images, why don’t you do something about it? Start a project. Who knows, Google might acquire it.

“Anyone else is free to find the same illustration in another antique book, but I found this one, so it’s mine.” And so these images remain inaccessible, not part of any public archive.”

Because no one wants to work for free. At least in this area. Maybe someone will read this post and take the cue, pumping in a great deal of money, energy and enthusiasm to create a wonderful free online archive that everyone has instant access to. (profile) says:


When we rebuilt Encyclopedia Dramatica we got a huge number of the articles from (We love you too!), but we lost sooooo many images. We are still missing something like 60 thousand image files, and many more of the files that we have are only thumbnails we pulled from Google cache snapshops of ED articles. The Internet needs an Image archive database. I guess it comes down to who would pay for it. I would donate whatever server space I could just to make sure that something like this never happens again, and I know many others would do the same.

Gaz Davidson (user link) says:

Here it is

The Internet Archive allows you to export directly via the web interface, just right click on the image, copy the URL, then edit it to change the rotation and scale:

Will Sandberg (user link) says:

You can capture the picture from the original source.

the URL of the picture shown is:

Unfortunately, the process of getting to it is cumbersome, and they are saved in JP2 format, which apparently no one can access except through their site one page at a time.

Panos says:

If it isn’t obvious by now, many corporations see value in public domain works, so they will do everything to make sure their version is far superior to what’s available online. It’s text and monochromatic images. Most of it would be top quality at 600dpi 1 bit lossless TIFF/PNG/ETC at 50-200kb/page. But no, they make sure lossy compression is used and that even the GB sized files are useless for getting better quality.

Anyone can do better quality with scanner or even camera at home. They do keep the high quality images and serve low quality that also includes their own ads in every page. How much would this kind of advertising cost in the real world? Millions and millions. One impression per page for each book?

They scan the images for their own use, get adverstising (which pays a lot more than their scanning cost) and win in all possible ways.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...