Twitter Wishes 4.5 Million Osama Bin Laden-Related Tweets Into Their API Cornfield
from the tweets-or-it-didn't-happen dept
Considering Twitter was instrumental in breaking the story of Osama Bin Laden's death, it seems somewhat strange that they would also be instrumental in limiting access to one of the biggest stories of 2011, if not the decade. (Of course, we're barely into this decade so we probably shouldn't be building these "best of" lists quite yet...) At the center of this unfortunate situation is a dataset constructed from public tweets using either "osama" or "bin laden," which was compiled using Twitter's own API.
Shortly after hearing of Bin Laden's unexpected mortal coil shuffling, Rob Domanski, who blogs as The Nerfherder, was informed of an archive of Osama Bin Laden-related tweets, all packaged up in handy XML format for use with DiscoverText software:
The datafiles were samples taken from live feed Twitter imports starting shortly after the announcement that Osama bin Laden’s death.
This was all for research purposes, however Twitter quickly shut down the project citing their Terms of Service (TOS) Agreement.
- Twitter searches for "bin laden" (647,585 documents, 505 MB)
- Twitter searches for "osama" (586,665 documents, 451 MB)
Stuart Shulman of DiscoverText had compiled the documents "using an authorized connection to Twitter via their API" which is apparently a violation of Twitter's API Terms of Service. He received an email from Twitter asking him to remove the datasets:
I'm writing about Twitter data being offered for sale on DiscoverText. Scraping the Twitter service is prohibited by our site Terms of Service, and furthermore, resyndicating data obtained through the Twitter API is prohibited by section I.4.a of our API Terms of Service (http://dev.twitter.com/pages/api_terms).
As such, we request you remove the datasets listed at http://discovertext.com/osamabinladen.aspx and any other datasets containing Tweets offered on your site.
Let’s be clear. We have never sold a Tweet. The data collected through the Twitter API and shared through our system is the same publicly available data other users capture with screenshots and share on blogs, Facebook or Twitter itself. Nonetheless, the datasets we have assembled and similar samples are being taken temporarily off the Web site pending a resolution of this issue with Twitter.
Well, "temporarily" has turned into "indefinitely." As of June 1st, Shulman's dataset contained 4.5 million Osama Bin Laden-related tweets, all of which can only be marveled at as a REALLY BIG NUMBER but not shared in any usable fashion thanks to Twitter's complaint.
If it's just a "policy first" decision on Twitter's part, it seems a little short-sighted. This information was (and is) of great interest to people worldwide. Perhaps some sort of warning could have been issued instead of a full takedown, thus allowing Twitter to assert its position on API usage without locking up the dataset. Once the dataset already exists, why block it? It's disheartening to see something with as much potential as Shulman's project getting thrown under the TOS bus.