Search Engines Should Ignore Bossy Publishers

from the disallow dept

James Grimmelman has an in depth look a ACAP, the new "standard" for website access control that we discussed last Friday. I put "standard" in scare quotes because, as Grimmelman points out, the specs clearly weren't written by people with any experience in writing technical standards. While a well-written standard will very precisely specify which behaviors are required, which are prohibited, and under what circumstances, the ACAP spec is full of vague directives and confusing terminology. Some parts of the standard are apparently designed to "only be interpreted by prior arrangement." Also, despite the "1.0" branding, the latest version of the specification has several sections that are labeled "not yet fully ready for implementation." It is, in short, a big mess.

Of course, this shouldn't surprise us, because it's not really a technical standard at all. Robots.txt works just fine for almost everyone, and search engines aren't clamoring to replace it. Rather, some publishers are using the trappings of a technical standard to try to micromanage the uses to which search engines put their content, and they're laying the groundwork for lawsuits if search engines fail to heed the demands embedded in ACAP files. Not only are the rules vague and confused, but the "standard" also helpfully notes that the rules "may change or be withdrawn without notice." In other words, a search engine that committed to complying with ACAP directives would be setting itself up to have their search engine's functionality micro-managed by the publishers who control the ACAP specifications.

Luckily, as Mike pointed out on Friday, search engines have the upper hand here. So here's my suggestion for search engines: instead of trying to comply with every nitpicky detail of the ACAP standard, just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file. Websites would discover pretty quickly that posting ACAP directives on their sites just caused their content to disappear from search engines. As much as they might bluster about other search engines "stealing" their content, the reality is that they can't afford to give up the traffic that search engines send their way. If search engines simply refused to include ACAP-restricted pages in their index, publishers would quickly realize that those old robots.txt files aren't so bad after all.



Reader Comments (rss)

(Flattened / Threaded)

  •  
    icon
    Steve R. (profile), Dec 6th, 2007 @ 10:26am

    Publishers may not actually own the content.

    James Grimmelman wrote in his report "All in all, it’s an interesting start. I’m concerned that the publishers will soon argue that failure to respect every last detail expressed in an ACAP file will constitute automatic copyright infringement, breach of contract, trespass to computer systems, a violation of the Computer Fraud and Abuse Act (and related state statutes), trespass vi et armis, highway robbery, land-piracy, misappropriation, alienation of affection, and/or manslaughter."

    I reiterate, that these DRM schemes to control access to content fail to consider the fact that the content may not even be owned by the content distributer. Further, if the content is not owned by the distributer and this is discovered, there appears to be no mechanism for this DRM technology to be disabled.

    Basically, we are devolving into an economic/legal system were a content distributer can assert ownership without proof and can take adverse action against a so-called "infringer" without due process.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    Chronno S. Trigger, Dec 6th, 2007 @ 10:28am

    Robot.txt

    Just looked up how a robots.txt file worked (never had to use one before). It seems pretty adaptable already. You can tell specific search engines what they can't look at down to a specific file. Why do they need to create a new one? At most I'd say add an allow function to say that these search engines are allowed despite the disallow *.

    Kept searching and found that Google had indexed the NY times robots.txt file.

    "just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file."

    That's exactly what I would do.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    Bob C, Dec 6th, 2007 @ 11:10am

    Yawn...

    I've been saying the search engines should ignore bossy publishers since Techdirt started mentioning the issue. You'll find comments to that effect after many of your articles.

    Glad to see you've finally come around to suggesting that yourself.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    BTR1701, Dec 6th, 2007 @ 1:05pm

    ACAP

    > If search engines simply refused to include
    > ACAP-restricted pages in their index, publishers
    > would quickly realize that those old robots.txt
    > files aren't so bad after all.

    No, they'd just go pay... err "convince" Congress to pass a law requiring search engines to use ACAP data and/or that treating it as a "disallow" is an unlawful restraint of trade or somesuch nonsense.

     

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
A word from our sponsors...
Essential Reading
Techdirt Reading List
Techdirt Insider Chat
A word from our sponsors...
Recent Stories
A word from our sponsors...

Close

Email This