Share/E-mail This Story

Email This



Why Is Google Punishing Sites That Publish Full RSS Feeds? [UPDATED]

from the not-good-at-all dept

Last year, we explained why full text RSS feeds make sense. You can read the whole thing, but the short version is that it makes it easier to read, and that means more people actually read the full stories and are willing to discuss them, share them and get others interested in reading as well. It just makes the reading experience that much better. We've always had full text RSS feeds, and we're not about to change that. However, it appears that Google may be punishing sites that have full text feeds. A concerned reader pointed us to the news that the magazine Mental Floss has reluctantly ditched its full text feeds because Google banned the site and told them the only way to get back in was to get rid of the full text feeds. Update: Matt Cutts from Google has responded in the comments and explained what happened. Turns out, despite the original post, it had nothing to do with full text RSS feeds, but the site was hacked. I'm glad that's been cleared up now (and thanks to the multiple Google employees who quickly responded to this post).

The "problem," according to Google, was that there were plenty of sites republishing Mental Floss's feeds, and Google's anti-spam algorithm supposedly uses that as an indication of spam. Of course, rather than figuring out which is the real site, it simply bans them all. This concerns me for a variety of reasons. The reason we publish a full text RSS feed is to make it easier for anyone to do what they want with our content -- even if it's republishing it. There are a bunch of sites that republish our RSS feed (some in the mistaken belief that such sites would get us upset at the "copyright infringement"). Those sites are harmless for the most part. Either they get no traffic at all, or they end up driving more traffic to us. That's great. But, it's a bit troublesome that Google might potentially disappear us from their entire index just because we publish a full text feed and someone else uses that feed exactly as they're supposed to.

I could understand if the deletion of Mental Floss from the index was simply a mistake, and upon being alerted to it, they restored the site. But the fact that Google's response was to tell Mental Floss to ditch the full text feeds is worrisome. What makes this even more ridiculous is that Feedburner, which is owned by Google, tells people that full text feeds are better. So, you have part of Google telling people to use full text feeds, and another part of Google punishing them for doing so.


Reader Comments (rss)

(Flattened / Threaded)

  1.  
    identicon
    Don't Tell Anyone I'm An SEO, Jul 21st, 2008 @ 1:19pm

    I doubt "Google" told them this. Google really isn't in the business of handing out this kind of information, which, if you're in the SEO biz, can be a matter of frustration. They just said "we were advised to cut off the spigot", not "we were advised, by Google, to cut off the spigot". Probably some SEO told them to do it.

    Google doesn't like "duplicate content." Publishing full RSS feeds saves the spammers the trouble of having to use scrapers to get all your content and throw it up on their sites, thus creating duplicate content. They weren't being penalized for having full RSS feeds, it's just a by-product of being an easy target.

    Google does try to identify the "real site" (the origin of the content) and give that site the credit. The problem is it's done algorithmically and that it's not perfect. Shedding light on cases like this is good, because it gives Google the opportunity to improve on their errors. TechDirt.com as a domain is probably strong enough a source that Google has no trouble figuring out the origin of the content. Mental Floss obviously wasn't as fortunate.

     

    reply to this | link to this | view in thread ]

  2.  
    identicon
    some old guy, Jul 21st, 2008 @ 1:21pm

    what part of RSS do they not get?

    RSS = Really Simple SYNDICATION. Seriously, the whole point of it was to make it easier for machine to machine syndication. As far as I am concerned "sample" feeds are advertising.

     

    reply to this | link to this | view in thread ]

  3.  
    identicon
    Matt Cutts, Jul 21st, 2008 @ 1:30pm

    The site was hacked. RSS has nothing to do with it.

    We emailed this site on July 7th to let them know exactly why we were removing the site; looks like it got hacked and was showing nasty content. It has *nothing to do* with full-text RSS feeds.

    Here's some of the email that we sent on July 7th to this site owner:

    Dear site owner or webmaster of mentalfloss.com,

    While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/webmasters/guidelines.html. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index.

    The following is some example hidden text we found at eg:http://www.mentalfloss.com/blogs/archives/2192:

    economics times india
    The application fee is collected by the JUPAS economics times india on behalf of the 9 participating institutions and is not refundable or transferable to another year.
    free 2004 income tax forms
    Request for use of Accumulated Surplus must be signed by the Hon Fin Sec/Treasurer and countersigned by the President of the Union/Club and submitted to OSA for approval. According to the agreement, Castrol will use Deutsche Bank's complete end-to-end payment and collection solution, as well as db-eBills - the Bank's innovative electronic invoice presentment and payment (EIPP) solution. The Internet's largest source of legitimate, copyrighted 100% digital sheet music since 1997, we now have over 10,000 songs for instant download! For extremely poor families, free 2004 income tax forms provides emergency assistance, while the conditionalities promote longer-term investments in human capital. Australia order viagra online clinic uk in Australia order viagra without a prescription in Australia order generic viagra and other prescription drugs online in Australia viagra order by phone in Australia viagra order on line in Australia order cheap viagra in Australia levitra cialis viagra comparison online order in Australia buy online order viagra in Australia order generic viagra in Australia order viagra overnight in Australia order by phone generic viagra in Australia viagra no prescr
    chase mastercard rewards program
    A device which forms a digitised image of a human fmger print for the purpose of biometric authentication. T subject to search without a warrant while on prison property, according to the lawsuit. It is rare to find an amateur player using this move in a poker game, so if your opponents see you using this move they can be fairly sure you know how to play good poker, and may think twice about bluffing you out of future pots. Download one of listed teens for chase mastercard rewards program taylor torrents or choose from category bit torrent downloads listed here to download your favorite torrent at torrentz. ACI Worldwide Eastern Europe Development is the fast-growing Romanian branch of ACI Worldwide.
    bad credit personal finance loans

    [...]

    In order to preserve the quality of our search engine, we have temporarily removed some of your webpages from our search results. Currently pages from mentalfloss.com are scheduled to be removed for at least 30 days.

    We would prefer to have your pages in Google's index. If you wish to be reconsidered, please correct or remove all pages (may not be limited to the examples provided) that are outside our quality guidelines. One potential remedy is to contact your web host technical support for assistance. For more information about security for webmasters, see http://googlewebmastercentral.blogspot.com/2007/09/quick-security-checklist-for-webmasters.html.

    When you are ready, please visit https://www.google.com/webmasters/tools/reinclusion?hl=en to learn more and submit your site for reconsideration.

    Sincerely,
    Google Search Quality Team

     

    reply to this | link to this | view in thread ]

  4.  
    icon
    Jim Gaudet (profile), Jul 21st, 2008 @ 1:40pm

    Matt has the answer!

    Its the big companies that are always getting hammered on. I personally think that Google is pretty fair.

    There was another article I read that is related to this one. Basically, the idea is that there is some sort of authentication for a publisher with Google. Then Google knows that this source is the authority on that subject, and that site will get the higher ranking. I think this is a good idea too.

     

    reply to this | link to this | view in thread ]

  5.  
    identicon
    Anonymous Coward, Jul 21st, 2008 @ 1:41pm

    Re:

    The linked article seems to indicate someone besides Google told them to do this. It also clearly states that they in the middle of Google's process of becoming reinstated. It is likely that as soon as a human being at Google gets to their case they will be restored.

    The only news here is that Google's algorithm isn't perfect and through a fluke it banned a perfectly legit site. Google is still not evil.

     

    reply to this | link to this | view in thread ]

  6.  
    identicon
    Matt Cutts, Jul 21st, 2008 @ 1:44pm

    By the way, full-text RSS feeds are great

    And as a software engineer at Google, I can attest that I happily use full-text RSS feeds on my own site. Go full text feeds! :)

     

    reply to this | link to this | view in thread ]

  7.  
    identicon
    Ben Robinson, Jul 21st, 2008 @ 1:46pm

    Unlikely

    Your write-up makes it sound like Mental Floss was directly in touch with Google, is this the case? It is pretty rare that they actually talk to site owners about specific problems and the article itself doesn't say anything about talking to Google. Google doesn't look at the number of sites that duplicate content as an indicator of spam, they use other, unrelated information to make a best guess of the original source of content and mark the rest as duplicates (which are then given little or no weight). If a site (like a scraper site republishing your feed) has nothing but duplicate content, then it is likely to be marked as spam and delisted (or at least severely handicapped in the search engine rankings).

    Google's algo for duplicate content is usually very good at detecting the original source and weeding out the scraper sites. Now, this could be a mistake from Google not getting the right site, but I did a very, very cursory check and noticed one glaring thing:

    CNN.com has quite a few of their articles published in full in what looks like some sort of cross promotional thing. Google gives a lot of weight to CNN and other big news sites as trusted originating sources. I think it is much more likely that Google's algo is viewing CNN as the original and Mental Floss as the duplicate.

    As I said, I did only a cursory look, but it is highly unlikely that scraper sites are causing the delisting. It is much more likely sites like CNN with dupe content or even some other violation of Google's guidelines.

    Unless your start selling links or something else against Google guidelines, techdirt has nothing to worry about publishing full feeds.

     

    reply to this | link to this | view in thread ]

  8.  
    identicon
    some old guy, Jul 21st, 2008 @ 1:48pm

    oooooh

    Well, I feel better now knowing that there was no truth in the ugly rumours.

     

    reply to this | link to this | view in thread ]

  9.  
    identicon
    Will Pearson from mental_floss, Jul 21st, 2008 @ 2:09pm

    A note from mentalfloss.com

    I was just informed of this post/conversation and wanted to chime in. I'm the president of mental_floss and simply wanted to clear up some confusion. We did not claim that Google instructed us to tweak our RSS feed and we are not blaming Google for any of this. For some reason I did not see the note from Google posted above and so we did not realize why we'd been pulled from their search.

    Once we realized we were no longer in Google's natural search, we immediately began taking steps to try and figure out what was going on. After asking a few others with experience in this area, it was suggested to us that we make sure no one was lifting our content from our RSS feed and publishing it in full on their site. We discovered another site that was and decided to tweak our RSS feed just in case that was the cause.

    We are continuing to look into this and will resolve the problem Matt has pointed out.

    It's very important to us that we are included in Google's index again so we'll work quickly to get this fixed. It's unfortunate because we run a clean operation so I hate that this has happened.

    But again, this is not Google's fault. They've simply recognized a problem and we'll work to fix it.

    Matt, if you'd be willing to discuss, I would love to have a conversation with you. Thanks for your attention to these matters.

    Thanks,

    Will

     

    reply to this | link to this | view in thread ]

  10.  
    identicon
    Mark, Jul 21st, 2008 @ 2:11pm

    Interesting

    On my iGoogle homepage, the update doesn't show on the Techdirt RSS feed. Only the original posting. If I hadn't found this story so unusual, and clicked through, to read in detail, I might have never known that the original article was wrong.

     

    reply to this | link to this | view in thread ]

  11.  
    identicon
    Matt Cutts, Jul 21st, 2008 @ 2:25pm

    Re: A note from mentalfloss.com

    Hi Will, I've asked someone to do a fresh look at mentalfloss.com and I suspect that you'll be back in Google's index again soon. I tried to leave a couple comments on the original post on mentalfloss.com, and those comments had my email info in case you'd like to contact me directly. But the short answer is just to make sure that any hacked content is removed. Yahoo still shows some of the hacked content, e.g. http://search.yahoo.com/search?p=site:mentalfloss.com+fiorcet so I'd just make sure that any pages like that are hack-free and you should be in good shape in short order.

    Best wishes,
    Matt

    P.S. Mike, thanks for the quick update on this story.

     

    reply to this | link to this | view in thread ]

  12.  
    identicon
    Will Pearson from mental_floss, Jul 21st, 2008 @ 2:25pm

    one other note from mental_floss

    While we were headed down the wrong path in solving this problem, I'd like to thank Mike for looking out for us. And thanks to the Google employees who commented with their information. I'm not sure why we didn't get the e-mail referenced above but I'm sure that was a problem on our end. This definitely puts us a step closer to resolving this issue.

     

    reply to this | link to this | view in thread ]

  13.  
    identicon
    Tony Comstock, Jul 21st, 2008 @ 2:46pm

    Search Team Letter

    Wow. Wish there had been something like that when our site got hacked and our Google index got polluted with spam. We were able to track it down, but it took about a month to get our index back in order!

    For those of us who are just trying to get people who want to see our content to our website, without wasting the time of people who don't want to see our content, meaningful search results are in the best interest of ourselves, our visitors, and Google. For example, we're still getting way way too much Nina Hartley traffic. It's not valuable to us, and proababy an annoyance to people searching for [nina hartley] and ending up at the website of an erotic documentary company.

    I wonder if there are other ways that Google could send e-mail to site owners to help the identify problems or otherwise tune their websites. Yes, I know about all the documentation available from Google, but letters like the above that address site-specific problems could be helpful to everyone.

     

    reply to this | link to this | view in thread ]

  14.  
    identicon
    VENKATAKRISHNA NALAMOTHU, Jul 21st, 2008 @ 5:36pm

    I am facing this problem

    Many sites are publishing my full RSS feeds and some of these sites are positioned ahead of me in Google search results. I even consulted Google regarding this with facts. But they advised me to contact copyright board or something which is an impossible thing for a small publisher like me from India.

     

    reply to this | link to this | view in thread ]

  15.  
    identicon
    Rich Pearson, Jul 21st, 2008 @ 6:56pm

    Is linking back the answer?

    Fantastic response and kudos to Mike for elevating.

    To ensure that the full RSS text movement is not slowed, is it correct to assume that you can avoid duplicate content or the situation @venkatakrishna is facing, provided that anyone using your content links back to you?

     

    reply to this | link to this | view in thread ]

  16.  
    identicon
    Matt Cutts, Jul 21st, 2008 @ 7:27pm

    Linking back helps

    Rich, I heartily recommend full-text RSS; it's what I use on my own blog. Here are a couple pointers to best practices that would help Venkatakrishna or anyone else avoid duplicate content issues:
    http://googlewebmastercentral.blogspot.com/2008/06/duplicate-content-due-to-scrapers.html
    http://www.vanessafoxnude.com/2008/05/14/ranking-as-the-original-source-for-content-you-syndicate/

    Hope that helps,
    Matt

     

    reply to this | link to this | view in thread ]

  17.  
    identicon
    Liberty Newsprint, Jul 22nd, 2008 @ 12:16pm

    We Love Republishing TECH DIRT

    Hi All, from my point of view RSS feeds are useless unless they are full-text. The exception being Feeds that come from within your own organization, you should truncate them so that folks may choose to read only what is relevant to their job. When you WANT people to read something you give it straight to them without a bunch of hoops to jump through. I publish an online Feed Newspaper at: http://www.LibertyNewsprint.com Using only full text feeds, you can print the paper out and read it off-line. I don't get lots of traffic yet but I find that people appreciate the archiving quality of the .PDF document.

     

    reply to this | link to this | view in thread ]

  18.  
    identicon
    مملكة الرومانسيه, Aug 29th, 2008 @ 4:31am

    منتديات مملكة الرومانسيه برامج معلومات حاسوب برامج انترنت برامج مونتاج منتدى تحشيش متديات عامه منتدى المطبخ متدى الاسره والطفل منتدى اختراق منتدى موبايل منتدى الشعر الفصيح منتدى الشعر الشعبي ابو ذيه دارمي منتدى الاغاني منتدى الاغاني العربيه منتدى الاغاني العراقيه منتدى الاغاني الغربيه والاجنبيه

     

    reply to this | link to this | view in thread ]

  19.  
    identicon
    مملكة الرومانسيه, Aug 29th, 2008 @ 4:31am

    منتديات مملكة الرومانسيه لعشاق الرومانسيه

    منتديات مملكة الرومانسيه برامج معلومات حاسوب برامج انترنت برامج مونتاج منتدى تحشيش متديات عامه منتدى المطبخ متدى الاسره والطفل منتدى اختراق منتدى موبايل منتدى الشعر الفصيح منتدى الشعر الشعبي ابو ذيه دارمي منتدى الاغاني منتدى الاغاني العربيه منتدى الاغاني العراقيه منتدى الاغاني الغربيه والاجنبيه

     

    reply to this | link to this | view in thread ]

  20.  
    identicon
    order viagra, Oct 20th, 2010 @ 2:19am

    health

     

    reply to this | link to this | view in thread ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
A word from our sponsors...
Essential Reading
Techdirt Reading List
Techdirt Insider Chat
A word from our sponsors...
Recent Stories
A word from our sponsors...

Close

Email This