Search Engine Cache Isn't Copyright Infringement

from the good-news-for-search-engines dept

There are some out there who have suggested that search engines such as Google and Yahoo are basically just massive copyright violators, because they scan, index and keep an archive of websites. That copied archive (usually called a cache) is, according to these commenters, an unauthorized copy. Now a court has basically destroyed that argument, noting that putting content online is giving an implicit license for search engines to index and copy. The lawsuit also claimed that individuals who visited the cached version were also infringers — but the court also rejected that argument, claiming that the implied license extends to those users. The only part of the case that seems to be moving forward is whether or not this implicit license was broken after the lawsuit started and search engines still didn’t take down the content. The idea there was that any explicit notification by the content holder might override the implicit license — and thus search engines should have taken down the content as soon as the lawsuit started (thus signaling an explicit revoke of the license). Of course, the whole thing seems pretty silly. If the guy didn’t want his content indexed, he should learn what a robots.txt file is for.

Filed Under: , ,
Companies: google, microsoft, yahoo

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Search Engine Cache Isn't Copyright Infringement”

Subscribe: RSS Leave a comment
eleete (user link) says:


The robot file is good also, but why not just password ‘protect’ the content. That is available to them also. Fact is, there are many ways a content publisher can remove their content from a search engine. Many users trying to get content IN to the engines make one small mistake and find themselves out or on page 12 of the results. Sounds like they were enjoying the traffic until something they didn’t approve of happened. Then the copyfight broke out. I agree with this ruling. If you don’t want your content out there, protect it yourself. There are many things a content owner can do to avoid this. I think this particular case was about getting money out of the big boys. Greed.

Dosquatch says:

Not the site

In spite of the cached copy, a search engine is NOT the cached site. A search engine is, roughly, a travel guide. A conveniently arranged source of information that tells you how to get where you want to go.

Or, in the words of a particular Eastern religion, the finger pointing at the moon is not the moon.

jonnyq says:

Re: Not the site

Do a google search and click “cached version” next to a result. That’s what we’re talking about.

I’m kinda on the fence here. It seems like we’re carving out a copyright exception to deal with the way the digital world works instead of just fixing the law in the first place.

I’d rather see a few less exceptions carved out long enough to force the law to be fixed.

But, I could be wrong, and the law could be correct in the first place.

John Doe says:

Interesting problem here. Funny that you mention that it is up to the website operator to use the Robots file to stop search engine indexing. That is like saying a guy should lock his doors if he doesn’t want robbed. Well yea, but that still doesn’t excuse the robber. And just like a robber breaking a window, the search engine could ignore the robots file.

This whole problem though, does go right along with your ideas about reworking copyright laws. Allowing copyright laws to block search engines would severely hinder “progress”. But then again, search engines profit from this data which technically violates copyright law.

Anonymous Coward says:

Re: Re:

I just disagree here. I think these claims about Search Engine cache or even thumb nail images being copyright infringement are ridiculous.
If you want to use a good analogy I would look to Hollywood. Many celebs have their name protected under copyright laws. Say a celebrity then decides to get a phone hooked up in their home and they did not take the time to get their number unlisted. Now if the new phone book came out and included the celeb in the listing would this fall under a copyright violation? These kind of lawsuits are on par with the, “I spilled coffee on my lap now I will sue McDonalds for 3 million”. Just like a celeb would need to take steps to get their number unlisted so would a web designer need to take steps to incorporate a robot txt file. Most people want Google and Yahoo to scan and index their site.
Now if Sean Pen said, “What my names in the phone book!! That’s it I am going to court!!”,
a judge would throw the case out of court. Unfortunately when it comes to technology a majority of people are not familiar enough with the environment to really make a good decision on it.

Mike (profile) says:

Re: Re:

That is like saying a guy should lock his doors if he doesn’t want robbed. Well yea, but that still doesn’t excuse the robber. And just like a robber breaking a window, the search engine could ignore the robots file.

No, not even remotely close. Putting a website online is specifically and proactively saying “here it is, we’re open for business!” When one computer sends a request to visit the other, the other actively welcomes it and tells you to make a copy of it.

But then again, search engines profit from this data which technically violates copyright law.

Profiting does not technically violate copyright law. It’s one factor of many, and there are plenty of cases where companies have been allowed to “profit” from others’ copyrighted works. In fact, it’s quite common.

Ben Robinson (user link) says:

Robber analogy

I think that is a terrible analogy. More apt would be to say you have your doors wide open to the whole world but get pissed when one particular person comes in. Robots.txt is like a little sign out front that can say “No Google Allowed”.

Sure, they could ignore the sign, but 1.they won’t ignore it and 2.if they did at least then you would have a legitimate gripe.

Dex says:

Re: Re: Robber analogy

That analogy makes no sense: no one is “taking” anything. It’s more like they went in and took a picture of your stuff, and then let people look at the pictures, and told people how to go see your stuff for themselves if they so desire.

Why are people putting content on the web if they don’t want people to find it? This business model makes no sense to me.

John Doe says:

Re: Re: Re: Robber analogy

You are taking a picture of “copyrighted” material. You can’t take a photo of a painting and use if for profit.

You did see that this article is about caching the content and not just indexing it? Indexing wouldn’t be a problem. You are just reporting you found these words on this site. Caching the actual site is making a copy of it and does run afoul of the copyright law. This kind of thing is exactly why Mike says copyright law needs to be re-thought out.

Dosquatch says:

Re: Re: Re:2 Robber analogy

You are taking a picture of “copyrighted” material. You can’t take a photo of a painting and use if for profit.

Not in and of itself, no, but as part of a larger creative work, like a collage, you most certainly can. If one wanted to argue that Google’s indexing of the world is a creative endeavor.

Or as a catalog of factual data. If one wanted to argue that all the search engine is doing is publishing a factual list of things (“sites”) and their locations.

Or a travel guide.

Or perhaps dozens of other examples.

And, neverminding that Google does not in fact profit from the cache views – they sell no advertising on those pages, and they PROMINENTLY announce that it is a cache view and NOT the actual site, so there is no argument of “confusion” here, either.

Dosquatch says:

Re: Re: Robber analogy

Your computer on your desk in your house, you expect to be private. Were “Teh Goog” caching that for public search, I can see you getting bunged.

A website is a billboard on a busy freeway. All Google does is say “There’s a billboard here”. One puts a billboard on a busy freeway, presumably, because one wants it to be seen. Having a 3rd party point at it and say, “look over there” hardly seems to be “taking” your content. That is, nobody goes to Google to see your billboard, they go to Google to see where to find your billboard.

As for the browser caching a copy, yes they do and always have. So putting stuff on the internet means you accept that behavior.

Indeed. So who are these webmasters who are unfamiliar with Google and its purpose, again?

John Doe says:

Re: Re: Re: Robber analogy

Obviously the ones suing are unfamiliar. To continue the devil’s advocate role, who says search engines have to cache content? By indexing and returning searchers to the site, they see whatever is there at that moment. By caching the site, they have created a copy of “copyrighted” work for re-display. This is a very fine point here.

Dosquatch says:

Re: Re: Re:2 Robber analogy

Obviously the ones suing are unfamiliar.

Ah. Fine. Perhaps they’ve been under a rock. It could happen.

Say you write a book. It’s a great book. You turn it over to $BigPublishingHouse… but then you notice that it’s in bookstores! Those BASTARDS! Just what do they think they’re up to??!? You had no idea they were going to sell it, fer Christ sake!

You had the reasonable expectation that nobody would ever know about your book unless you told them of it personally, or … I don’t know … maybe magic faries clubbed them with the Stick of Obscure Realization or something.

But to think $BigPublishingHouse would take your manuscript and make it public?? Unthinkable.

Anonymous Coward says:

Re: Robber analogy

The robber analogy is flawed, because the robber is taking something away from the house, while the cache is just making a (usually inferior) copy of it; a more appropriate analogy would be looking in an open window and taking a not of what is inside the house.

I would also like to note that:

a) cached pages are out of date;

b) most cached pages don’t include much of the content of the original page, or it doesn’t work properly (Javascript)

Michial (user link) says:

robot.txt is useless in a lot of cases

The rules you set in robot.txt is only as good as the search spider is willing to follow. It seems only the major engines abide by your wishes.

In many cases the search engines are doing a serious injustice by caching the content of sites. Especially in the case of my site which is entirely data driven.

Depending on the entry page, and the options chosen the content will be different from person to person, and since there is a live system in use by end users, even the content of a given page outdated in a matter of minutes in some cases…

All in all I have a robot.txt file, and then I als monitor my logs for spiders and just block their IP addresses.

InanimateOne says:

No Index Meta Tags

I knew prior to ever uploading my website that search engines kept a cached copy of pages, that I needed to specify what pages I did not want indexed, and that I needed a robots.txt file. The fact this guy didn’t know that is his fault not the search engines. The “no follow, no index” meta tag works great. Kind of ironic that a simple Google search for “do not index” could have saved this dude a lot of trouble.

DitchDigger says:

Not quite black& white - (new robbery analogy)

The robbery analogy somewhere up in this thread oversimplifies (as is pretty common to ‘puter folk) the problem. Even with brick & mortar world, walking into a house through an unlocked door does not constitute a robbery. At worst, the act’s considered trespassing. That analogy would only apply if the header had a Nofollow/Noindex/nocache or some such – which may be seen as a “lock”, but I’d say more of a “No Solicitors” sign on the door. And even an .htaccess file does not an *explicit* lock – it only brings on a more interesting question – is the expectation of privacy on the net justified, and, if so, why?

Ed says:

The problem seems two fold.

1) Google profiting from caching: Since Google does not “seem” to put their own adds on cached pages, I do not see that they are directly profiting from the caching. They are making money on the ads on there indexing pages, not there caching.

2) Copyright truly needs to have limits attached. Not only of duration, but also of owners absolute control. As far as I know, a library does not have to pay special fees for purchase a book, in order to lend it to the public. It is understood that this is in the public interest. Yet I can easily imagine some authors screaming “They are accessing my work for free! Each reader owes me X dollars!”. This would be ridicules, but I bet some think it appropriate. The same could reasonably be applied to Google. Yes, I realize the library payed for the one copy of the book, and Google did not, but neither did it really cost the author/publisher anything for Google to possess a copy, while the library has paper/printing/binding to pay for.

And really, who goes to the cached page anyway, if the real page is up, and intact? I certainly only use them as backup access.

Anonymous Coward says:

I’m not sure caching, in and of itself, is the issue. You have download a copy to view in your browser – whether or not this is saved in RAM or on a hard drive, as well. I suppose you could try to re-implement browsers such that you receive a stream and they must parse that data into the appropriately tokenized tree.

What makes this special in terms of search engines is that the cached copy is available – it’s being redistributed by the search engine.

Robots.txt does not cover caching. That’s what the appropriate meta tags are for. You can be indexed but not cached with the correct setup, provided the search engines decide to obey what is basically a voluntary standard.

Remember, before the people who cloaked sites got wise, how you’d hit the cache of a paywalled site in Google? That’s what this is about. And, of course, trying to suck blood from the lumbering behemoth which is Google.

Rodney Dunham says:

Publishing text/images/movies/audio/whatever to a website is just that, publishing. Making available to the great unwashed for free. Anyone who doesn’t get that should really hire someone who does before they make a website.

If anyone wants information to be unsearchable, they need only make a page with a button (not set to be a “submit” type, but which does an onclick=form.submit) that you have to press to proceed to the “protected” content. Then make sure not to link to that content any other way, or include it in your sitemap. Spiders won’t see the protected stuff, humans won’t have any trouble getting to it.

…but the whole idea of the web is to share information, so this seems like a bass-ackwards way to use it right from the start.

Jane Somebody says:

Point of the Web

The web was created to enable the free and easy distribution of information (Scientific info at that time). If you choose to use something that was created for sharing information freely, then abide by that or create gates/locks to your information.

The person who is suing is an idiot. If they put info out there for me to read, I can easily print it, copy it, store it. I don’t need permission to use the technology that the info was put on.

PRMan (profile) says:

Robber analogy

Funny that you mention that it is up to the website operator to use the Robots file to stop search engine indexing. That is like saying a guy should lock his doors if he doesn’t want robbed.

This is exactly like leaving your doors unlocked. Anyone can come in unless you:

1. Tell them that they can no longer come in
2. Put up a no tresspassing sign
3. Lock your doors

At any of those points, if someone then goes in anyway THEN they can say the guy entered illegally. ROBOTS.TXT is the web’s way of posting a No Tresspassing sign. You can also use a META tag, which is more like telling individuals that they can not come in. Or, you can password-protect, which is like locking your doors.

If a guy goes into your house (with unlocked doors), takes a picture of the insides of your house and goes to the town square and shows everyone the pictures and you complained to the cops, the cops would laugh at you and tell you to install door locks or put up a No Tresspassing sign.

This is EXACTLY the same, and the judge made a good ruling.

Dosquatch says:

Re: Robber analogy

If a guy goes into your house

So, so close. It has to do with the expectation that a random person should be granted entry. Even if the doors on your house are unlocked, there’s a certain reasonable expectation that not just any random schmoe should be wandering through your living room.

Replace “house” with something of a more public venue, like a convenience store. You expect public traffic. It’s pretty much kinda the entire point… like a website. And then, say, there’s this weird kid in town who, instead of cutting grass, has figured out how to make a few bucks telling other people what you have in your store. Not selling it, not taking away from your profits, more likely increasing your profits by sending extra people to your wares.

He doesn’t charge you a cent. He doesn’t cost you a cent. He increases your traffic and profits. He saves you a certain amount of contracting out advertising on your own.

Boy howdy, it’s the wonder kid. One would think a “thank you” might be appropriate.

One would think.

Randall Krause (profile) says:

Copyright is a guaranteed right, not a voluntary right

It amazes me how many people do not even comprehend copyright law in the U.S. Copyright was established to strike a balance between the interests of the public to gain access to artistic works while encouraging creators to continue to create artistic works as commodities for public consumption.

All of the analogies here are completely ridiculous. Breaking into someone’s house? Photographing people’s wares? Looking at people naked?!

A far better analogy is that somebody creates a lounge for people to come listen to music for free. Then the Yellow Pages stops in and besides merely adding the venue information to its phone book, encloses a free DVD with a reproduction of the recordings being played at the lounge in their entirety — unless the venue owners explicitly post on its front door “unauthorized recording is prohibited.”

Well last I checked, copyright law is not a voluntary right. It is a guaranteed right with the only conditional exemption being fair use — which primarily applies to personal, non-commercial uses. So the venue owners shouldn’t have to post anything to secure the protections of copyright against willful infringement.

To say that Google is NOT violating copyright law, is setting the legal precedent that “it is okay to make reproductions of publicly performed copyrighted works available in their entirety so long as you claim they are a cached copy from a search engine”.

Thus pirates can now freely exchange music and movies by simply creating a “search engine” that records radio shows and TV programs (in which the public performance was provided free of charge) and makes those “cached copies” available without charge to the public while generating revenue from ancillary advertising on the search results pages.

Heck, I see some interesting business prospects here.


Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...