DailyDirt: Computers Are Editing Our Double-Plus-Ungood Content

from the urls-we-dig-up dept

More and more digital media is being edited and prioritized in datacenters by intangible algorithms. As usual, this can be good and bad, depending on how the technology is used. On the one hand, algorithms can do laborious tasks that humans don't want to do. But at the same time, algorithms might introduce all kinds of errors or inadvertent biases on a scale that no group of humans could ever accomplish without automation. Here are just a few links on bots tinkering with online content. If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Reader Comments (rss)

(Flattened / Threaded)

  1.  
    identicon
    zip, Apr 7th, 2014 @ 7:47pm

    vandalism-by-bot on Wikipedia

    "the biggest bot job on Wikipedia is detecting vandalism."

    And the second-biggest bot "job" on Wikipedia is perpetrating vandalism.

    I've seen much content, often high-quality content, deleted by bots. It seems to follow a familiar pattern: a new person -unregistered- comes in and writes a substantial addition to an article, but due to a minor violation of some arbitrary rule, everything the person wrote is automatically deleted. (and often times, new writers don't return to "defend" their edits)

    Here's just one example of vandalism-by-wikibot that caught my attention:

    https://en.wikipedia.org/w/index.php?title=Skiptrace&diff=302349485&oldid=30234931 2

     

    reply to this | link to this | view in thread ]

  2.  
    identicon
    Anonymous Coward, Apr 8th, 2014 @ 12:14am

    Re: vandalism-by-bot on Wikipedia

    imho in that case that's not vandalism-by-bot but correct bucket-of-spam links removal.

    When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action.

    Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump.

     

    reply to this | link to this | view in thread ]

  3.  
    identicon
    zip, Apr 8th, 2014 @ 3:35am

    Re: Re: vandalism-by-bot on Wikipedia

    "imho in that case that's not vandalism-by-bot but correct bucket-of-spam links removal. When doing edits on different sections as an unregistered user it's common courtesy to only change a single section, not to change various ones in a single edit doing major revisions and dump a bucketload of URL links at the end of the article. This behavior is extremely common for url link spammers and that caused the revert action. Adding the URLs usually belongs in a separate edit action... check the Talk page of that ip address, the reason for the revert is stated clearly there: url link dump."


    Bots don't argue "reasons" -- they spit out canned responses (and enforce non-negotiable blanket rules) when triggered. In this case based on the inclusion of a single word, "myspace".

    This was the offending line that nuked everything:

    "MySpace (http://www.myspace.com)- a "self-promotion" site where people often provide substantial details about themselves"

    So because of a single line containing the URL of the home page of a highly-popular site on the 'ban' list, the entire body of work by that author was thrown out. Although well-intentioned, the bot 'crashed-and-burned' here because the bot's programmer failed to distinguish between links to personal pages on MySpace (of which there are millions) and the front page of MySpace. The bot was obviously programmed to assume that any links to MySpace (homepage or not) were put there by a spammer trying to googlebomb his personal vanity page to increase its search-engine ranking. As judge, jury and executioner, the bot pronounced that Wikipedia editor guilty of link-spamming, and as punishment, deleted not just the offending word, but all edits ever made by that person (even those that broke no "rules") going all the way back to his first appearance on Wikipedia.

    That's severe overkill, based on an invalid assumption, triggered by the bot's slopily-programmed ruleset. And as a result, the Wikipedia bot vandalized --in this case permanently-- an entire two-thirds of an article about an informative subject.

    But just think about it for a moment ... the Wikipedia article "Skiptrace" is about research methods used to locate people. Doesn't it seem counterproductive that Wikipedia's search-and-destroy bots would (mis)identify the URLs of these related search engines and online research tools used by investigators for data mining -- including the website of the US Post Office -- and consider them all to be "link spam" - even when they are precisely on-topic and relevant to the subject?

    I find it amusing that "Anonymous Coward" would find the Wikipedia bot's draconian enforcement action to be justified because a new user was not aware of the various customs peculiar to Wikipedia. I think this is one of the main problems with Wikipedia -- the site has become very unfriendly and unforgiving to new visitors, who are somehow expected to know a long list of esoteric rules before they ever start. Rules that are often counter-intuitive and illogical to an outsider not steeped in the "culture" of Wikipedia.

     

    reply to this | link to this | view in thread ]

  4.  
    icon
    John Fenderson (profile), Apr 8th, 2014 @ 8:11am

    Re: Re: Re: vandalism-by-bot on Wikipedia

    Myspace is still "highly popular"??

     

    reply to this | link to this | view in thread ]

  5.  
    identicon
    Anonymous Coward, Apr 8th, 2014 @ 8:19am

    Re: Re: Re: vandalism-by-bot on Wikipedia

    It's a very poor quality edit that is correctly being reversed by the bot.

    The biggest problem with the edit is actually that that mass of material is unsourced original research; it shouldn't stay even if it's "defended" (unless the defense is adequate sourcing, which seems unlikely).

    The biggest problem with the bot program is that well-intentioned edits such as this one get reversed without a whole lot of deep clarification. This causes potential new editors to get alienated early. However, even with major efforts at mentoring, it's really rare to convert someone with a "link dump" mentality, such as that evinced by the reverted edit, into a good editor. They tend to have preconceived notions about how Wikipedia should be that are at odds with the views of most other editors.

     

    reply to this | link to this | view in thread ]

  6.  
    identicon
    Rekrul, Apr 8th, 2014 @ 9:47am

    You left out the biggest and worst bot editor around: YouTube's Content ID filter.

     

    reply to this | link to this | view in thread ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
Advertisement
Essential Reading
Techdirt Deals
Techdirt Insider Chat
Techdirt Reading List
Advertisement
Recent Stories
Advertisement
Support Techdirt - Get Great Stuff!

Close

Email This

This feature is only available to registered users. Register or sign in to use it.