Search Engines Should Ignore Bossy Publishers
from the disallow dept
James Grimmelman has an in depth look a ACAP, the new "standard" for website access control that we discussed last Friday. I put "standard" in scare quotes because, as Grimmelman points out, the specs clearly weren't written by people with any experience in writing technical standards. While a well-written standard will very precisely specify which behaviors are required, which are prohibited, and under what circumstances, the ACAP spec is full of vague directives and confusing terminology. Some parts of the standard are apparently designed to “only be interpreted by prior arrangement.” Also, despite the "1.0" branding, the latest version of the specification has several sections that are labeled "not yet fully ready for implementation." It is, in short, a big mess.Of course, this shouldn't surprise us, because it's not really a technical standard at all. Robots.txt works just fine for almost everyone, and search engines aren't clamoring to replace it. Rather, some publishers are using the trappings of a technical standard to try to micromanage the uses to which search engines put their content, and they're laying the groundwork for lawsuits if search engines fail to heed the demands embedded in ACAP files. Not only are the rules vague and confused, but the "standard" also helpfully notes that the rules "may change or be withdrawn without notice." In other words, a search engine that committed to complying with ACAP directives would be setting itself up to have their search engine's functionality micro-managed by the publishers who control the ACAP specifications.
Luckily, as Mike pointed out on Friday, search engines have the upper hand here. So here's my suggestion for search engines: instead of trying to comply with every nitpicky detail of the ACAP standard, just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file. Websites would discover pretty quickly that posting ACAP directives on their sites just caused their content to disappear from search engines. As much as they might bluster about other search engines "stealing" their content, the reality is that they can't afford to give up the traffic that search engines send their way. If search engines simply refused to include ACAP-restricted pages in their index, publishers would quickly realize that those old robots.txt files aren't so bad after all.
Filed Under: publishers, robots.txt, search engines
Companies: associated press, google, microsoft, yahoo
Comments on “Search Engines Should Ignore Bossy Publishers”
Publishers may not actually own the content.
James Grimmelman wrote in his report “All in all, it’s an interesting start. I’m concerned that the publishers will soon argue that failure to respect every last detail expressed in an ACAP file will constitute automatic copyright infringement, breach of contract, trespass to computer systems, a violation of the Computer Fraud and Abuse Act (and related state statutes), trespass vi et armis, highway robbery, land-piracy, misappropriation, alienation of affection, and/or manslaughter.”
I reiterate, that these DRM schemes to control access to content fail to consider the fact that the content may not even be owned by the content distributer. Further, if the content is not owned by the distributer and this is discovered, there appears to be no mechanism for this DRM technology to be disabled.
Basically, we are devolving into an economic/legal system were a content distributer can assert ownership without proof and can take adverse action against a so-called “infringer” without due process.
Just looked up how a robots.txt file worked (never had to use one before). It seems pretty adaptable already. You can tell specific search engines what they can’t look at down to a specific file. Why do they need to create a new one? At most I’d say add an allow function to say that these search engines are allowed despite the disallow *.
Kept searching and found that Google had indexed the NY times robots.txt file.
“just announce that every line of an ACAP file will be interpreted as the equivalent of a “Disallow” line in a robots.txt file.”
That’s exactly what I would do.
I’ve been saying the search engines should ignore bossy publishers since Techdirt started mentioning the issue. You’ll find comments to that effect after many of your articles.
Glad to see you’ve finally come around to suggesting that yourself.
> If search engines simply refused to include
> ACAP-restricted pages in their index, publishers
> would quickly realize that those old robots.txt
> files aren’t so bad after all.
No, they’d just go pay… err “convince” Congress to pass a law requiring search engines to use ACAP data and/or that treating it as a “disallow” is an unlawful restraint of trade or somesuch nonsense.