Are Yahoo & The AP Manipulating Comments? Or Are They Just Really Bad At The Internet? [Updated]

from the do-you,-uh,-yahoo? dept

Someone who prefers to remain anonymous sent over this story about how Associated Press stories hosted on Yahoo News appear to have tons of comments from old stories. It's not entirely clear what's happening, though I have my suspicions (explained further down), but it appears that when new stories are showing up on certain topics, Yahoo is simply copying over older comments from previous stories on similar or related topics. The comments look as if they're about the story posted -- and the only way you can tell they're not is if you notice the date:
I'd go from one Yahoo article to another and notice that regardless of the subject matter, the first user comment was always the same -- at least on AP articles covering the Israeli/Palestinian conflict. The comment that kept reappearing was posted by "Robert" and it was a one liner. "Hamas is now in control of the Gaza Strip after winning an election there against Abbas Palestinian Authority." That was it. Fair enough -- I've got no quarrel with the messenger or the message. But somehow that one comment generated an incredible 184 responses and, last I checked, readers had given it 3212 thumbs up and 2525 thumbs down.

I got a little curious about why Robert's one liner had generated so much controversy. I've written hundreds of articles and never got anywhere near that kind of attention. Frankly, I was full of envy. How did 'Robert' pull this off with one miserly line? Then I noticed the strangest thing: it was dated March 09, 2010. The comment was two months old and was the lead comment of 40,000 responses. That seemed a little high considering the fact that the AP article I was reading had only been posted for thirty minutes.

What were Yahoo and AP up to? The answer is simple; they were porting comments from one article to another and, in this particular case, they've been doing it for two months.
Oddly (and inexplicably) the author of that post, Ahmed Amr, does not link to Yahoo to show this, but it's not hard to find. Here's a story published on June 3rd, 2010 at 9:19pm. Yet, there's that same first comment, from March 9th, at 12:47am. And here's a story published on May 6th at 1:09 pm with the identical comments, also beginning with the March 9th comment. To let you see what they both look like before they change (and I'll explain in a second why I think they'll change) I've turned both of those pages into PDFs, which you can see below (you may have to either download or view at full screen and scroll to see the "comments" at the bottom):


I've also looked around and found really similar things on other stories. While Amr is suggesting there's something nefarious going on with the AP "manipulating" comments (and he specifically calls out the reporters from the AP who he believes are a part of this), I'm going to guess that this is more typical (embarrassing) incompetence on the part of Yahoo, rather than malice.

Take a look at the two links I put above to the Yahoo stories. The URLs (as found by a quick search for the comment string Amr mentioned in his post) are as follows:
  • http://news.yahoo.com/s/ap/ml_israel_palestinian
  • http://news.yahoo.com/s/ap/20100506/ap_on_re_mi_ea/ml_israel_palestinians
Notice something similar? The last bit of the URL string is identical "/ml_israel_palestinian". The only difference is that the second URL, the story from May 6th, inserts two additional directories, with the top one being the date of publication. We already know that, due to a total disregard for the basic principles of the way the internet can and does work, that the AP limits its partners from hosting AP articles for very long. I believe on most sites you can host the articles for a month and then you need to take them down completely. With most sites, what happens is you get a 404 error or page not found (to this day, I can't figure out why they don't at least point you to a place where you can find the missing article). However, it appears that Yahoo decides to recycle the URLs in an attempt to make the URLs simple and understandable. So, any basic story about the Israeli Palestinian conflict might appear under that first URL. For all I know, by the time you're reading this, it's an entirely different story than the one that was published on June 3rd.

After the date of publication, breaking the basic principle of a link to a news story being a link to that news story alone, Yahoo moves the story to a new date-defined directory, and the original URL is freed up for the next story on that particular topic. If this seems stupid and confusing to users and destructive to the very idea of the "link economy" or valuing earned or passed links, you're right. But take that up with Yahoo and the Associated Press.

Of course, here's where the real level of tech incompetence comes in: It appears that Yahoo News' comment system doesn't understand that Yahoo does this. So, it associates the comments to that last bit of the URL string "/ml_israel_palestinian" and the same comments will appear every time that string is used as the final part of a URL string. It's bizarre that Yahoo would do this, but apparently, that's how Yahoo rolls.

Amr suggests that this is part of a planned bit of "corporate fraud" by Yahoo and the AP, perhaps to make it look like certain stories are getting a hell of a lot more comments than they are. He also suggests other conspiracy theories involving pro-Israeli operatives, saying that as far as he can tell, this only happens on AP stories concerning the Israeli/Palestinian crisis. I believe Amr didn't try very hard to find alternatives. On my very first attempt to find an example related to something entirely different, I found the identical behavior. I just picked a popular story that likely would have multiple stories over multiple days: the BP oil spill in the Gulf. Then I looked for an AP story hosted by Yahoo News... Bingo.

The first news story I found was published on June 3rd at 2:28 pm, but the first comment on the story? Why it's from May 1st at 2:06am. And the URL? The string ends with "us_gulf_oil_spill_947." You can find the identical comments on this story which was published May 21st, but ends with the string "us_gulf_oil_spill" suggesting that Yahoo's comment system also ignores numbers at the end of that final URL part in replicating its comments.

And here's another story about the White House's response to the oil spill. Published June 3rd at 11:57 pm. First comment? May 10, 2010 12:58 pm. URL string? "us_gulf_oil_spill_washington_9". And here's a story from May 17th with the identical comments at the end, with the closing URL string "us_gulf_oil_spill_washington_1." Yup, Yahoo seems to just match up comments with pretty simple URL hashes.

You can see all of that below as embedded PDFs:






So while it's easy and tempting to ascribe this to "manipulation" and suggest malice on the part of the AP or Yahoo or whoever else (Israeli operatives? Seriously?), it seems pretty clear that this is more due to technical incompetence on Yahoo's part, somewhat driven by the AP's ridiculous "delete this story after x days" licensing policies.

Update: The AP got in touch to make it entirely clear that this is entirely Yahoo's incompetence and not its own:
The Associated Press distributes news content to Yahoo! News, but the display of AP stories and the curating of comments are entirely up to Yahoo!
While undoubtedly true, in the comments we've heard from multiple people who work at news sites that license AP content, and they note that AP has a weird feed process, whereby it gives a simple slug like the ones used above, so that it can force update stories, often leading people to see stories totally change over the course of the day. This is clearly a Yahoo issue, but AP's policies don't help.

Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. icon
    Hephaestus (profile), 4 Jun 2010 @ 4:38pm

    Re: Re: Re: Props

    Actually I think the razor is Occam's razor. ...

    When competing hypotheses are equal in other respects, the principle recommends selection of the hypothesis that introduces the fewest assumptions and postulates the fewest entities while still sufficiently answering the question.

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Techdirt Gear
Shop Now: Copying Is Not Theft
Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories
Advertisement
Report this ad  |  Hide Techdirt ads

Close

Email This

This feature is only available to registered users. Register or sign in to use it.