Search Engines Should Ignore Bossy Publishers

from the disallow dept

Thu, Dec 6th 2007 09:55am - Timothy Lee

James Grimmelman has an in depth look a ACAP, the new "standard" for website access control that we discussed last Friday. I put "standard" in scare quotes because, as Grimmelman points out, the specs clearly weren't written by people with any experience in writing technical standards. While a well-written standard will very precisely specify which behaviors are required, which are prohibited, and under what circumstances, the ACAP spec is full of vague directives and confusing terminology. Some parts of the standard are apparently designed to “only be interpreted by prior arrangement.” Also, despite the "1.0" branding, the latest version of the specification has several sections that are labeled "not yet fully ready for implementation." It is, in short, a big mess.Of course, this shouldn't surprise us, because it's not really a technical standard at all. Robots.txt works just fine for almost everyone, and search engines aren't clamoring to replace it. Rather, some publishers are using the trappings of a technical standard to try to micromanage the uses to which search engines put their content, and they're laying the groundwork for lawsuits if search engines fail to heed the demands embedded in ACAP files. Not only are the rules vague and confused, but the "standard" also helpfully notes that the rules "may change or be withdrawn without notice." In other words, a search engine that committed to complying with ACAP directives would be setting itself up to have their search engine's functionality micro-managed by the publishers who control the ACAP specifications.

Luckily, as Mike pointed out on Friday, search engines have the upper hand here. So here's my suggestion for search engines: instead of trying to comply with every nitpicky detail of the ACAP standard, just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file. Websites would discover pretty quickly that posting ACAP directives on their sites just caused their content to disappear from search engines. As much as they might bluster about other search engines "stealing" their content, the reality is that they can't afford to give up the traffic that search engines send their way. If search engines simply refused to include ACAP-restricted pages in their index, publishers would quickly realize that those old robots.txt files aren't so bad after all.

Steve R. (profile)

December 6, 2007 at 10:26 am

Publishers may not actually own the content.

James Grimmelman wrote in his report “All in all, it’s an interesting start. I’m concerned that the publishers will soon argue that failure to respect every last detail expressed in an ACAP file will constitute automatic copyright infringement, breach of contract, trespass to computer systems, a violation of the Computer Fraud and Abuse Act (and related state statutes), trespass vi et armis, highway robbery, land-piracy, misappropriation, alienation of affection, and/or manslaughter.”

I reiterate, that these DRM schemes to control access to content fail to consider the fact that the content may not even be owned by the content distributer. Further, if the content is not owned by the distributer and this is discovered, there appears to be no mechanism for this DRM technology to be disabled.

Basically, we are devolving into an economic/legal system were a content distributer can assert ownership without proof and can take adverse action against a so-called “infringer” without due process.

Chronno S. Trigger

December 6, 2007 at 10:28 am

Robot.txt

Just looked up how a robots.txt file worked (never had to use one before). It seems pretty adaptable already. You can tell specific search engines what they can’t look at down to a specific file. Why do they need to create a new one? At most I’d say add an allow function to say that these search engines are allowed despite the disallow *.

Kept searching and found that Google had indexed the NY times robots.txt file.

“just announce that every line of an ACAP file will be interpreted as the equivalent of a “Disallow” line in a robots.txt file.”

That’s exactly what I would do.

Bob C

December 6, 2007 at 11:10 am

Yawn...

I’ve been saying the search engines should ignore bossy publishers since Techdirt started mentioning the issue. You’ll find comments to that effect after many of your articles.

Glad to see you’ve finally come around to suggesting that yourself.

BTR1701 (profile)

December 6, 2007 at 1:05 pm

ACAP

> If search engines simply refused to include
> ACAP-restricted pages in their index, publishers
> would quickly realize that those old robots.txt
> files aren’t so bad after all.

No, they’d just go pay… err “convince” Congress to pass a law requiring search engines to use ACAP data and/or that treating it as a “disallow” is an unlawful restraint of trade or somesuch nonsense.

Add Your Comment

Wednesday
15:39	Universal Music's Copyright Claim: 99 Problems And Fair Use Ain't One (0)
13:35	Techdirt Podcast Episode 388: Copyright Conundrum (0)
12:05	Biden Signs TikTok Ban Bill; Expect A Lawsuit By The Time You Finish Reading This Article (40)
10:50	DeSantis Signs Law Limiting Book Challenges After The Shitty People He Encouraged To Be Shitty Proved To Be Even Shittier Than He Thought They'd Be (30)
10:45	Daily Deal: The Premium Python Programming PCEP Certification Prep Bundle (0)
09:31	FTC Bans Non-Competes, Sparks Instant Lawsuit: The War For Worker Freedom (20)
05:31	Grindr Hit By UK Lawsuit For Reckless Sale Of Sensitive User Data (1)
Tuesday
20:00	David Chang Issues C&Ds Over 'Chile Crunch' Products, Then Apologizes And Promises To Stop (2)
15:34	Because It's Done Such A Great Job Policing Illegal Drugs, The DEA Decides It's Time To Start Engaging In Legal Drug Hysteria (24)
13:38	When You Need To Post A Lengthy Legal Disclaimer With Your Parody Song, You Know Copyright Is Broken (26)

Search Engines Should Ignore Bossy Publishers

from the disallow dept

Comments on “Search Engines Should Ignore Bossy Publishers”

Publishers may not actually own the content.

Robot.txt

Yawn...

ACAP

Add Your Comment Cancel reply

Comment Options:

What's this?

The Techdirt Greenhouse

Trending Posts

Wednesday

Tuesday

More

Tools & Services

Company

Contact

More

Search Engines Should Ignore Bossy Publishers

from the disallow dept

Comments on “Search Engines Should Ignore Bossy Publishers”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

The Techdirt Greenhouse

Trending Posts

Wednesday

Tuesday

More

Email This Story

Tools & Services

Company

Contact

More