Search Engines Should Ignore Bossy Publishers
from the disallow dept
Of course, this shouldn't surprise us, because it's not really a technical standard at all. Robots.txt works just fine for almost everyone, and search engines aren't clamoring to replace it. Rather, some publishers are using the trappings of a technical standard to try to micromanage the uses to which search engines put their content, and they're laying the groundwork for lawsuits if search engines fail to heed the demands embedded in ACAP files. Not only are the rules vague and confused, but the "standard" also helpfully notes that the rules "may change or be withdrawn without notice." In other words, a search engine that committed to complying with ACAP directives would be setting itself up to have their search engine's functionality micro-managed by the publishers who control the ACAP specifications.
Luckily, as Mike pointed out on Friday, search engines have the upper hand here. So here's my suggestion for search engines: instead of trying to comply with every nitpicky detail of the ACAP standard, just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file. Websites would discover pretty quickly that posting ACAP directives on their sites just caused their content to disappear from search engines. As much as they might bluster about other search engines "stealing" their content, the reality is that they can't afford to give up the traffic that search engines send their way. If search engines simply refused to include ACAP-restricted pages in their index, publishers would quickly realize that those old robots.txt files aren't so bad after all.