from the potentially-problematic dept
For many years we’ve written stories regarding various lawsuits over scraping the web. Without the ability to scrape the web, we’d have no search engines, no Internet Archive, and lots of other stuff wouldn’t work right either. However, more importantly, the ability to scrape the web should result in a better overall internet, potentially reversing the trend of consolidation and internet giants that silo off your info. Most often, we’ve talked about this in the context of Facebook’s case over a decade ago against Power.com. That involved a company that was trying to build a single dashboard for multiple social media companies, allowing users to log into a single interface to see content from, and post content to, multiple platforms at once. In that case Facebook relied on the Computer Fraud and Abuse Act (the CFAA), and the courts sided with Facebook, saying that because Facebook had sent Power a cease-and-desist letter, that made the access (even with the approval of the users themselves!) somehow “unauthorized.”
Over the years, we’ve pointed out how this decision and interpretation of the CFAA is one of the biggest reasons the market for social media is not as competitive as it could be. That decision effectively said that Facebook could build its own silo, in which your data checks in but it never checks out. Other tech companies — including Craigslist and LinkedIn — have brought similar lawsuits, though in LinkedIn’s case against HiQ the court cut back the earlier Power.com ruling, and basically said that it only applied to information that was behind a registration wall. Publicly available information was legal to scrape.
More recently, Facebook parent company Meta has again gone after scraping operations. Earlier this year, we noted how the company had sued a somewhat sketchy provider of “insights” into “influencers and their audiences” that had been scraping information on Facebook. And, now, the company has announced two new lawsuits against scraping companies. Once again, neither of the defendants are as sympathetic as Power, and Meta even frames these lawsuits as “safeguarding” its users privacy.
The first lawsuit, against a company called Octopus Data, raises all sorts of questions. Octopus offers a cloud-based service called Octoparse, which allows customers to extract web data from basically any URL without having to do any coding yourself. This is actually… really really useful? Especially for researchers. The ability to scrape and extract data from webpages is not just useful, it’s how lots of services work, including search engines. But Meta is not at all happy.
Since at least March 25, 2015, and continuing to the present, Defendant Octopus Data Inc., (“Octopus”) has operated an unlawful service called Octoparse, which was designed to improperly collect or “scrape” user account profiles and other information from various websites, including Amazon, eBay, Twitter, Yelp, Google, Target, Walmart, Indeed, LinkedIn, Facebook and Instagram.
Defendant’s service used and offered multiple products to scrape data. First, Defendant offered to scrape data directly from various websites on behalf of its customers (the “Scraping Service”). Second, Defendant developed and distributed software designed to scrape data from any website, including Facebook and Instagram, using a customer’s self-compromised account (the “Scraping Software”). Defendant’s Scraping Software was capable of scraping any data accessible to a logged in Facebook and Instagram user. And Defendant designed the “premium” Scraping Software to launch scraping campaigns from Defendant’s computer network and infrastructure. Finally, Defendant claimed to use and distribute technologies to avoid being detected and blocked by Meta and other websites they scraped.
Defendant’s conduct was not authorized by Meta and it violates Meta’s and Instagram’s terms and policies, and federal and California law. Accordingly, Meta seeks damages and injunctive relief to stop Defendant’s use of its platform and products in violation of its terms and policies.
Perhaps notably, Facebook does not try to use either the CFAA or California’s state equivalent in this case. Instead, it tosses in… a copyright claim. That’s because one of the premium services of Octoparse is that it will scrape the data and store it on its own server — and Meta argues that Octoparse violates Section 1201 of the DMCA (the anti-circumvention part) because the scraping tool has to “circumvent” Meta’s technical tools put in place to block Octoparse.
Certain user generated content is also copyright protected and users grant Meta a non-exclusive, transferable, sub-licensable, royalty-free, and worldwide license to host, use, distribute, modify, run, copy, publicly perform or display, translate, and create derivative works of that content consistent with the user’s privacy and application settings.
Meta uses technological measures designed to detect and disrupt automaton and scraping and that also effectively control access to Meta’s and users’ copyright protected works, including requiring users to register for an account and login to the account before using those products, monitoring for the automated creation of accounts, monitoring account use patterns that are inconsistent with a human user, employing a reCAPTCHA program to distinguish between bots and human users, identifying and blocking of IP addresses of known data scrapers, disabling accounts engaged in automated activity, and setting rate and data limits.
Defendant has circumvented and is circumventing technological measures that effectively control access to copyright protected works and those of its users on Facebook and Instagram and/or portions thereof.
Defendant manufactures, provides, offers to the public, or otherwise traffics in technology, products, services, devices, components, or parts thereof, that are primarily designed or produced for the purpose of circumventing technological measures and/or protection afforded by technological measures that effectively control access to copyright protected works and/or portions thereof.
Defendant’s Octoparse Scraping Services or parts thereof, as described above, have no or limited commercially significant purpose or use other than to circumvent technological measures that effectively control access to Meta and its user’s copyrighted works and/or portions thereof in order to scrape copyright protected data from Facebook and Instagram.
So, much of that is bullshit. Octoparse seems like a pretty useful service for researchers and others looking to extract data from websites. There are tons of non-nefarious reasons for doing so, including research or building tools to enable people to access content on social media sites without having to set up an account and give all your info to Meta.
In other words, this lawsuit seems dangerous in multiple ways — an expansion of DMCA 1201, and a tool that Meta can use in a similar manner to what it did with Power and the CFAA to effectively limit competition and to build higher walls for its silos.
The second lawsuit, admittedly, involves a much, much sketchier defendant (which may be why Meta seems to be playing it up, and why much of the press coverage focuses on this lawsuit, rather than the Octoparse one). It’s against a guy named Ekrem Ates, who is apparently based in Turkey and runs (or possibly ran) a website with the evocative name of MyStalk.
MyStalk would scrape information from Instagram users, and repost it to its own site, so that users could follow an Instagram users’ stories without (1) having to log in to Instagram or (2) reveal to the original uploader who was viewing the video. For semi-obvious reasons you can see why this is a bit… creepy. And stalkerish (I mean, the name doesn’t help). But, there are potentially useful reasons for such a service. I mean, in some ways it’s similar to the Nitter service that some people use to view tweets without sharing information back to Twitter.
But, again, Meta insists this is nothing but evil.
Beginning no later than July 2017 and continuing until present, Defendant Ekrem Ateş used unauthorized automation software to improperly access and collect—or “scrape”—the profiles of Instagram users, including their posts, photos, Stories, and profile information. Defendant’s automation software used thousands of automated Instagram accounts that falsely identified themselves as legitimate Instagram users connected to either the official Instagram mobile application or website. Through this fraudulent connection, Defendant scraped data from the profiles of over 350,000 Instagram users. These profiles had not been set to private by the users and, beyond a limited number of profiles and posts, were publicly viewable only to loggedin Instagram users. Defendant published the scraped data on his own websites, which allowed visitors to view and search for Instagram profiles, displayed user data scraped from Instagram, and promoted “stalking people” without their noticing. Defendant also generated revenue by displaying ads on these websites.
Meta notes that it sent Ates a cease and desist letter (a la Power). Ates, apparently without a lawyer (and not very wisely) replied directly to the C&D, admitting to a bunch of stuff he probably should not have admitted to. He claimed that he shut down the services he ran and deleted the data, but also that he had sold the “mystalk” domain to someone else and no longer had control over it. Meta’s lawyers asked him to say who he sold it to, and Ates tried to use that as a negotiation tactic, saying he would reveal the information if Meta promised not to take legal action against him. Meta’s lawyers were, as lawyers are, somewhat vague, suggesting that something might be worked out, but without promising anything, and after that Ates went silent — leading to this lawsuit.
Ates does admit that he made about $1000 from the site, and says he got rid of it because it wasn’t worth it, and says he spent more than that maintaining the site.
This lawsuit is… strange on multiple levels. Ates is clearly a small time player, and he’s based in Turkey, so it seems unlikely he’s going to show up in a US federal court. A default judgment seems like the most likely outcome.
Like the Octoparse case, this one involves breach of contract and unjust enrichment claims, but then adds in California Penal Code § 502. This is the California equivalent of the CFAA.
So, yes, obviously someone setting up a website to allow people to “stalk” others is unsympathetic. But the underlying issue still remains: scraping data and extracting data is also a really useful tool. It’s useful for research. It’s useful for building additional services. It’s useful for creating competition and for limiting the ability of certain internet giants to control absolutely everything.
Yes, it can be abused. But it really feels here (yet again) that this is Meta/Facebook leaning hard on the fact that people keep complaining it doesn’t do enough to protect its users’ privacy as an excuse to get legal rulings that will increasingly shield the company from both scrutiny and competition.