October 24, 2008 at 5:48 am

robots.txt

The robot file is good also, but why not just password ‘protect’ the content. That is available to them also. Fact is, there are many ways a content publisher can remove their content from a search engine. Many users trying to get content IN to the engines make one small mistake and find themselves out or on page 12 of the results. Sounds like they were enjoying the traffic until something they didn’t approve of happened. Then the copyfight broke out. I agree with this ruling. If you don’t want your content out there, protect it yourself. There are many things a content owner can do to avoid this. I think this particular case was about getting money out of the big boys. Greed.

Benjie

October 24, 2008 at 6:18 am

Public

If the data/information is publically avaliabe, then it’s the website’s fault.

If the data is able to be index/scanned becvause proper security messures aren’t taken, then it’s the web sites fault

If someone released non-public data outside of the website’s wishes, then the search engine should purge that data

Dosquatch

October 24, 2008 at 6:26 am

Not the site

In spite of the cached copy, a search engine is NOT the cached site. A search engine is, roughly, a travel guide. A conveniently arranged source of information that tells you how to get where you want to go.

Or, in the words of a particular Eastern religion, the finger pointing at the moon is not the moon.

jonnyq

October 24, 2008 at 8:05 am

Re: Not the site

Do a google search and click “cached version” next to a result. That’s what we’re talking about.

I’m kinda on the fence here. It seems like we’re carving out a copyright exception to deal with the way the digital world works instead of just fixing the law in the first place.

I’d rather see a few less exceptions carved out long enough to force the law to be fixed.

But, I could be wrong, and the law could be correct in the first place.

Dosquatch

October 24, 2008 at 8:43 am

Re: Re: Not the site

Like I said, in these comments even, they PROMINENTLY announce that it is a cache view and NOT the actual site, so there is no argument of “confusion” here.

John Doe

October 24, 2008 at 6:41 am

Interesting problem here. Funny that you mention that it is up to the website operator to use the Robots file to stop search engine indexing. That is like saying a guy should lock his doors if he doesn’t want robbed. Well yea, but that still doesn’t excuse the robber. And just like a robber breaking a window, the search engine could ignore the robots file.

This whole problem though, does go right along with your ideas about reworking copyright laws. Allowing copyright laws to block search engines would severely hinder “progress”. But then again, search engines profit from this data which technically violates copyright law.

nasch

October 24, 2008 at 8:37 am

Re: Re:

It’s more like, if you’re going to put your stuff out on the sidewalk, if you don’t want people to take it you should put up a sign that says please don’t take this stuff.

Anonymous Coward

October 24, 2008 at 9:41 am

Re: Re:

I just disagree here. I think these claims about Search Engine cache or even thumb nail images being copyright infringement are ridiculous.
If you want to use a good analogy I would look to Hollywood. Many celebs have their name protected under copyright laws. Say a celebrity then decides to get a phone hooked up in their home and they did not take the time to get their number unlisted. Now if the new phone book came out and included the celeb in the listing would this fall under a copyright violation? These kind of lawsuits are on par with the, “I spilled coffee on my lap now I will sue McDonalds for 3 million”. Just like a celeb would need to take steps to get their number unlisted so would a web designer need to take steps to incorporate a robot txt file. Most people want Google and Yahoo to scan and index their site.
Now if Sean Pen said, “What my names in the phone book!! That’s it I am going to court!!”,
a judge would throw the case out of court. Unfortunately when it comes to technology a majority of people are not familiar enough with the environment to really make a good decision on it.

DanC

October 24, 2008 at 10:07 am

Re: Re:

But then again, search engines profit from this data which technically violates copyright law.

Incorrect. It is not against the law to profit from copyrighted works if it falls under fair use.

Mike (profile)

October 24, 2008 at 10:37 am

Re: Re:

That is like saying a guy should lock his doors if he doesn’t want robbed. Well yea, but that still doesn’t excuse the robber. And just like a robber breaking a window, the search engine could ignore the robots file.

No, not even remotely close. Putting a website online is specifically and proactively saying “here it is, we’re open for business!” When one computer sends a request to visit the other, the other actively welcomes it and tells you to make a copy of it.

But then again, search engines profit from this data which technically violates copyright law.

Profiting does not technically violate copyright law. It’s one factor of many, and there are plenty of cases where companies have been allowed to “profit” from others’ copyrighted works. In fact, it’s quite common.

Anonymous Coward

October 24, 2008 at 6:46 am

Cached sites

Correct me if I am wrong, but when you browse any site, your browser downloads a copy of the images and pages to your browser cache on your local computer. Does this mean that every browser out there was considered illegal by these commenters?

Ben Robinson (user link)

October 24, 2008 at 6:52 am

Robber analogy

I think that is a terrible analogy. More apt would be to say you have your doors wide open to the whole world but get pissed when one particular person comes in. Robots.txt is like a little sign out front that can say “No Google Allowed”.

Sure, they could ignore the sign, but 1.they won’t ignore it and 2.if they did at least then you would have a legitimate gripe.

John Doe

October 24, 2008 at 6:58 am

Re: Robber analogy

Ah, but you left the doors wide open for them to “view” the goods not “take” them.

As for the browser caching a copy, yes they do and always have. So putting stuff on the internet means you accept that behavior.

Jim

October 24, 2008 at 7:09 am

Re: Re: Robber analogy

Not to be mean, but you have twice demonstrated the fact that you don’t understand how search engines work.

John Doe

October 24, 2008 at 7:14 am

Re: Re: Re: Robber analogy

Would you care to elaborate? Making statements like this without backing it up is kind of pointless.

Dex

October 24, 2008 at 7:22 am

Re: Re: Robber analogy

That analogy makes no sense: no one is “taking” anything. It’s more like they went in and took a picture of your stuff, and then let people look at the pictures, and told people how to go see your stuff for themselves if they so desire.

Why are people putting content on the web if they don’t want people to find it? This business model makes no sense to me.

John Doe

October 24, 2008 at 7:28 am

Re: Re: Re: Robber analogy

You are taking a picture of “copyrighted” material. You can’t take a photo of a painting and use if for profit.

You did see that this article is about caching the content and not just indexing it? Indexing wouldn’t be a problem. You are just reporting you found these words on this site. Caching the actual site is making a copy of it and does run afoul of the copyright law. This kind of thing is exactly why Mike says copyright law needs to be re-thought out.

Dosquatch

October 24, 2008 at 7:43 am

Re: Re: Re:² Robber analogy

You are taking a picture of “copyrighted” material. You can’t take a photo of a painting and use if for profit.

Not in and of itself, no, but as part of a larger creative work, like a collage, you most certainly can. If one wanted to argue that Google’s indexing of the world is a creative endeavor.

Or as a catalog of factual data. If one wanted to argue that all the search engine is doing is publishing a factual list of things (“sites”) and their locations.

Or a travel guide.

Or perhaps dozens of other examples.

And, neverminding that Google does not in fact profit from the cache views – they sell no advertising on those pages, and they PROMINENTLY announce that it is a cache view and NOT the actual site, so there is no argument of “confusion” here, either.

Dosquatch

October 24, 2008 at 7:31 am

Re: Re: Robber analogy

Your computer on your desk in your house, you expect to be private. Were “Teh Goog” caching that for public search, I can see you getting bunged.

A website is a billboard on a busy freeway. All Google does is say “There’s a billboard here”. One puts a billboard on a busy freeway, presumably, because one wants it to be seen. Having a 3rd party point at it and say, “look over there” hardly seems to be “taking” your content. That is, nobody goes to Google to see your billboard, they go to Google to see where to find your billboard.

As for the browser caching a copy, yes they do and always have. So putting stuff on the internet means you accept that behavior.

Indeed. So who are these webmasters who are unfamiliar with Google and its purpose, again?

John Doe

October 24, 2008 at 7:34 am

Re: Re: Re: Robber analogy

Obviously the ones suing are unfamiliar. To continue the devil’s advocate role, who says search engines have to cache content? By indexing and returning searchers to the site, they see whatever is there at that moment. By caching the site, they have created a copy of “copyrighted” work for re-display. This is a very fine point here.

Dosquatch

October 24, 2008 at 7:56 am

Re: Re: Re:² Robber analogy

Obviously the ones suing are unfamiliar.

Ah. Fine. Perhaps they’ve been under a rock. It could happen.

Say you write a book. It’s a great book. You turn it over to $BigPublishingHouse… but then you notice that it’s in bookstores! Those BASTARDS! Just what do they think they’re up to??!? You had no idea they were going to sell it, fer Christ sake!

You had the reasonable expectation that nobody would ever know about your book unless you told them of it personally, or … I don’t know … maybe magic faries clubbed them with the Stick of Obscure Realization or something.

But to think $BigPublishingHouse would take your manuscript and make it public?? Unthinkable.

Anonymous Coward

October 24, 2008 at 7:38 am

Re: Robber analogy

The robber analogy is flawed, because the robber is taking something away from the house, while the cache is just making a (usually inferior) copy of it; a more appropriate analogy would be looking in an open window and taking a not of what is inside the house.

I would also like to note that:

a) cached pages are out of date;

b) most cached pages don’t include much of the content of the original page, or it doesn’t work properly (Javascript)

John Doe

October 24, 2008 at 7:44 am

Re: Re: Robber analogy

The “flaw” is the whole crux of Mike’s argument about copyright law. This goes back to copying music, video, games or digital whatever.

Michial (user link)

October 24, 2008 at 6:54 am

robot.txt is useless in a lot of cases

The rules you set in robot.txt is only as good as the search spider is willing to follow. It seems only the major engines abide by your wishes.

In many cases the search engines are doing a serious injustice by caching the content of sites. Especially in the case of my site which is entirely data driven.

Depending on the entry page, and the options chosen the content will be different from person to person, and since there is a live system in use by end users, even the content of a given page outdated in a matter of minutes in some cases…

All in all I have a robot.txt file, and then I als monitor my logs for spiders and just block their IP addresses.

InanimateOne

October 24, 2008 at 7:54 am

No Index Meta Tags

I knew prior to ever uploading my website that search engines kept a cached copy of pages, that I needed to specify what pages I did not want indexed, and that I needed a robots.txt file. The fact this guy didn’t know that is his fault not the search engines. The “no follow, no index” meta tag works great. Kind of ironic that a simple Google search for “do not index” could have saved this dude a lot of trouble.

DitchDigger

October 24, 2008 at 7:58 am

Not quite black& white - (new robbery analogy)

The robbery analogy somewhere up in this thread oversimplifies (as is pretty common to ‘puter folk) the problem. Even with brick & mortar world, walking into a house through an unlocked door does not constitute a robbery. At worst, the act’s considered trespassing. That analogy would only apply if the header had a Nofollow/Noindex/nocache or some such – which may be seen as a “lock”, but I’d say more of a “No Solicitors” sign on the door. And even an .htaccess file does not an *explicit* lock – it only brings on a more interesting question – is the expectation of privacy on the net justified, and, if so, why?

Ed

October 24, 2008 at 8:40 am

The problem seems two fold.

1) Google profiting from caching: Since Google does not “seem” to put their own adds on cached pages, I do not see that they are directly profiting from the caching. They are making money on the ads on there indexing pages, not there caching.

2) Copyright truly needs to have limits attached. Not only of duration, but also of owners absolute control. As far as I know, a library does not have to pay special fees for purchase a book, in order to lend it to the public. It is understood that this is in the public interest. Yet I can easily imagine some authors screaming “They are accessing my work for free! Each reader owes me X dollars!”. This would be ridicules, but I bet some think it appropriate. The same could reasonably be applied to Google. Yes, I realize the library payed for the one copy of the book, and Google did not, but neither did it really cost the author/publisher anything for Google to possess a copy, while the library has paper/printing/binding to pay for.

And really, who goes to the cached page anyway, if the real page is up, and intact? I certainly only use them as backup access.

Dosquatch

October 24, 2008 at 8:47 am

Re: Re:

And really, who goes to the cached page anyway, if the real page is up, and intact? I certainly only use them as backup access.

They’re also handy to peek at search results that might otherwise get swallowed by *ork’s web filters.

Anonymous Coward

October 24, 2008 at 8:55 am

I’m not sure caching, in and of itself, is the issue. You have download a copy to view in your browser – whether or not this is saved in RAM or on a hard drive, as well. I suppose you could try to re-implement browsers such that you receive a stream and they must parse that data into the appropriately tokenized tree.

What makes this special in terms of search engines is that the cached copy is available – it’s being redistributed by the search engine.

Robots.txt does not cover caching. That’s what the appropriate meta tags are for. You can be indexed but not cached with the correct setup, provided the search engines decide to obey what is basically a voluntary standard.

Remember, before the people who cloaked sites got wise, how you’d hit the cache of a paywalled site in Google? That’s what this is about. And, of course, trying to suck blood from the lumbering behemoth which is Google.

Rodney Dunham

October 24, 2008 at 9:21 am

Publishing text/images/movies/audio/whatever to a website is just that, publishing. Making available to the great unwashed for free. Anyone who doesn’t get that should really hire someone who does before they make a website.

If anyone wants information to be unsearchable, they need only make a page with a button (not set to be a “submit” type, but which does an onclick=form.submit) that you have to press to proceed to the “protected” content. Then make sure not to link to that content any other way, or include it in your sitemap. Spiders won’t see the protected stuff, humans won’t have any trouble getting to it.

…but the whole idea of the web is to share information, so this seems like a bass-ackwards way to use it right from the start.

Jane Somebody

October 24, 2008 at 9:33 am

Point of the Web

The web was created to enable the free and easy distribution of information (Scientific info at that time). If you choose to use something that was created for sharing information freely, then abide by that or create gates/locks to your information.

The person who is suing is an idiot. If they put info out there for me to read, I can easily print it, copy it, store it. I don’t need permission to use the technology that the info was put on.

David

October 24, 2008 at 10:51 am

Wow, really glad about this ruling. After the way the Perfect 10 v. Google lawsuits were going I was thinking that some copyright decisions were going to make it very hard for search engines to keep operating the way we expect them to.

PRMan (profile)

October 24, 2008 at 11:49 am

Robber analogy

Funny that you mention that it is up to the website operator to use the Robots file to stop search engine indexing. That is like saying a guy should lock his doors if he doesn’t want robbed.

This is exactly like leaving your doors unlocked. Anyone can come in unless you:

1. Tell them that they can no longer come in
2. Put up a no tresspassing sign
3. Lock your doors

At any of those points, if someone then goes in anyway THEN they can say the guy entered illegally. ROBOTS.TXT is the web’s way of posting a No Tresspassing sign. You can also use a META tag, which is more like telling individuals that they can not come in. Or, you can password-protect, which is like locking your doors.

If a guy goes into your house (with unlocked doors), takes a picture of the insides of your house and goes to the town square and shows everyone the pictures and you complained to the cops, the cops would laugh at you and tell you to install door locks or put up a No Tresspassing sign.

This is EXACTLY the same, and the judge made a good ruling.

Dosquatch

October 24, 2008 at 1:29 pm

Re: Robber analogy

If a guy goes into your house

So, so close. It has to do with the expectation that a random person should be granted entry. Even if the doors on your house are unlocked, there’s a certain reasonable expectation that not just any random schmoe should be wandering through your living room.

Replace “house” with something of a more public venue, like a convenience store. You expect public traffic. It’s pretty much kinda the entire point… like a website. And then, say, there’s this weird kid in town who, instead of cutting grass, has figured out how to make a few bucks telling other people what you have in your store. Not selling it, not taking away from your profits, more likely increasing your profits by sending extra people to your wares.

He doesn’t charge you a cent. He doesn’t cost you a cent. He increases your traffic and profits. He saves you a certain amount of contracting out advertising on your own.

Boy howdy, it’s the wonder kid. One would think a “thank you” might be appropriate.

One would think.

satan (user link)

October 24, 2008 at 2:23 pm

Kind of like walkin outside naked

and sueing people for looking at you.

No Six Pack

October 24, 2008 at 5:01 pm

Google should remove said sites from the search results.
This would solve the complaint, no ?

Estigy (user link)

October 27, 2008 at 7:09 am

How is "search engine" defined?

Can somebody please tell me, where I can look up the (legal) definition of a search engine?

I wonder what the borders look like that separate a (big and very general) search engine like Google from specialized ones like auto-generated link lists…

Thanks, A.

MansickWRONGAgain

October 28, 2008 at 8:37 am

Mike: Don't ever pick horses

Mike:

Clearly, without question, your legal instincts are bad. See today’s Google story for proof.

My guess is you’re still hoping for a Ron Paul upset in the election.

Time to get out of the basement and see the world for what it REALLY is.

Randall Krause (profile)

August 15, 2011 at 10:18 am

Copyright is a guaranteed right, not a voluntary right

It amazes me how many people do not even comprehend copyright law in the U.S. Copyright was established to strike a balance between the interests of the public to gain access to artistic works while encouraging creators to continue to create artistic works as commodities for public consumption.

All of the analogies here are completely ridiculous. Breaking into someone’s house? Photographing people’s wares? Looking at people naked?!

A far better analogy is that somebody creates a lounge for people to come listen to music for free. Then the Yellow Pages stops in and besides merely adding the venue information to its phone book, encloses a free DVD with a reproduction of the recordings being played at the lounge in their entirety — unless the venue owners explicitly post on its front door “unauthorized recording is prohibited.”

Well last I checked, copyright law is not a voluntary right. It is a guaranteed right with the only conditional exemption being fair use — which primarily applies to personal, non-commercial uses. So the venue owners shouldn’t have to post anything to secure the protections of copyright against willful infringement.

To say that Google is NOT violating copyright law, is setting the legal precedent that “it is okay to make reproductions of publicly performed copyrighted works available in their entirety so long as you claim they are a cached copy from a search engine”.

Thus pirates can now freely exchange music and movies by simply creating a “search engine” that records radio shows and TV programs (in which the public performance was provided free of charge) and makes those “cached copies” available without charge to the public while generating revenue from ancillary advertising on the search results pages.

Heck, I see some interesting business prospects here.

–Randall

Friday
11:03	FBI Raids Office Of Dem Politician Instrumental In Redrawing Virginia Voter Maps... With Fox News In Tow (0)
10:58	Daily Deal: Linux/UNIX Certification Training Bundle (0)
09:25	Court To DOGE Bros: Asking ChatGPT 'Yo, Is This DEI?' Is Not Proper Legal Process & Also A First Amendment Violation (9)
05:19	Appeals Court Kills FCC Effort To Acknowledge Racism In Broadband Deployment (11)
Thursday
20:02	GameStop CEO Appears To Be Auctioning Off Video Game History (19)
16:17	Ctrl-Alt-Speech: The Human Element In The Room (0)
13:15	Utah Wants Websites To See Through VPNs. That's Not How VPNs Work. (30)
11:02	To The Surprise Of No One, Cops Are Using ALPR Cameras To Stalk Their Exes (5)
10:57	Daily Deal: The Complete Raspberry Pi And Alexa A-Z Bundle (0)
09:26	John Roberts Wants You To Stop Believing Your Own Eyes (47)

Search Engine Cache Isn't Copyright Infringement

from the good-news-for-search-engines dept

Comments on “Search Engine Cache Isn't Copyright Infringement”

Leave a Reply to Anonymous Coward Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More