Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law

from the phew dept

For years now we’ve been following cases related to scraping data off of websites and the Computer Fraud and Abuse Act (CFAA). The CFAA is an extremely poorly drafted law, that has been stretched by both law enforcement and civil plaintiffs alike to argue that all sorts of things are “unauthorized access” and therefore hacking. We’ve covered many of these cases over the years. The courts have at least started to push back on some of the more extreme interpretations of the law, though it’s still problematic.

Over a decade ago, we followed a case that I still think is one of the most problematic rulings for the internet: when Facebook sued a small startup called Power.com. Power made a social media aggregator, allowing you to access all your different social media accounts through one interface and even to post messages across multiple platforms through that single interface. In order to do that, you had to provide your login to Power, which would access your social media accounts, suck out the data (or push in the data for posting). Again, this was the user willingly granting their login information. Leaving aside whether or not it’s wise to share your login info with a third party, it was still the user’s choice.

However, Facebook decided that this was hacking and in violation of the CFAA… and the courts (tragically) agreed, allowing Facebook to effectively shut down a useful service that would have prevented Facebook from locking up so much data (and becoming such a dominant player). The key reason the court sided with Facebook was it claimed that once Facebook sent a cease-and-desist letter, that effectively mean that any further scraping was “unauthorized.” I still think that we’d see an extremely different competitive landscape today if the Power case had turned out differently. It would have significantly limited the ability of the big social media players to lock in their users. Instead, the rule more or less turned Facebook into a roach motel where your data checks in, but it can never check out.

Other internet companies unfortunately followed suit, using similar lawsuits against websites providing useful complementary services. Craigslist went after 3taps, which made Craigslist data available to third party apps. LinkedIn went after a company called HiQ that was scraping and making use of LinkedIn data. Here, unlike the Power case, the courts actually ruled against LinkedIn saying that LinkedIn could not use the CFAA to block scraping of public data. The key difference between this case and the Power one was that HiQ was scraping public info (i.e., it didn’t need to log in to LinkedIn with someone’s info to access the data). LinkedIn appealed… and lost again. LinkedIn then asked the Supreme Court to weigh in, resulting in the Supreme Court vacating the 9th Circuit’s ruling and sending it back to the court to reconsider in light of last summer’s big Van Buren ruling that limited parts of the CFAA.

So now, with yet another chance… the 9th Circuit has correctly concluded the same thing. HiQ’s scraping of public information still does not violate the CFAA. There are a few different legal issues involved here, but the CFAA claims are the main event. LinkedIn argued that it sent a cease-and-desist to HiQ, so as per the Power ruling, its continued scraping violated the law.

The panel reviewing this case goes deep into the CFAA, why it exists, and what it’s supposed to do before concluding that LinkedIn’s interpretation can’t be the correct one, noting that “the CFAA is best understood as an anti-intrusion statute and not as a ‘misappropriation statute,'” and as such accessing public information shouldn’t be a violation.

Put differently, the CFAA contemplates the existence of three kinds of computer systems: (1) computers for which access is open to the general public and permission is not required (2) computers for which authorization is required and has been given, and (3) computers for which authorization is required but has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part of the system accessed). Public LinkedIn profiles, available to anyone with an Internet connection, fall into the first category. With regard to websites made freely accessible on the Internet, the “breaking and entering” analogue invoked so frequently during congressional consideration has no application, and the concept of “without authorization” is inapt.

As for reconsidering in light of the Van Buren ruling, that doesn’t change things.

Van Buren’s “gates-up-or-down inquiry” is consistent with our interpretation of the CFAA as contemplating three categories of computer systems

[….]

Van Buren’s distinction between computer users who “can or cannot access a computer system,” suggests a baseline in which there are “limitations on access” that prevent some users from accessing the system (i.e., a “gate” exists, and can be either up or down). The Court’s “gates-up-or-down inquiry” thus applies to the latter two categories of computers we have identified: if authorization is required and has been given, the gates are up; if authorization is required and has not been given, the gates are down. As we have noted, however, a defining feature of public websites is that their publicly available sections lack limitations on access; instead, those sections are open to anyone with a web browser. In other words, applying the “gates” analogy to a computer hosting publicly available webpages, that computer has erected no gates to lift or lower in the first place.17 Van Buren therefore reinforces our conclusion that the concept of “without authorization” does not apply to public websites.

The court again distinguishes Power from the HiQ case by saying that Facebook limited access to the data to only those who were logged in, as opposed to the more public access available on LinkedIn.

In that case, Facebook sued Power Ventures, a social networking website that aggregated social networking information from multiple platforms, for accessing Facebook users’ data and using that data to send mass messages as part of a promotional campaign. Id. at 1062–63. After Facebook sent a cease-and-desist letter, Power Ventures continued to circumvent IP barriers and gain access to password protected Facebook member profiles. Id. at 1063. We held that after receiving an individualized cease-and-desist letter, Power Ventures had accessed Facebook computers “without authorization” and was therefore liable under the CFAA. Id. at 1067–68. But we specifically recognized that “Facebook has tried to limit and control access to its website” as to the purposes for which Power Ventures sought to use it. Id. at 1063. Indeed, Facebook requires its users to register with a unique username and password, and Power Ventures required that Facebook users provide their Facebook username and password to access their Facebook data on Power Ventures’ platform. Facebook, Inc. v. Power Ventures, Inc., 844 F. Supp. 2d 1025, 1028 (N.D. Cal. 2012). While Power Ventures was gathering user data that was protected by Facebook’s username and password authentication system, the data hiQ was scraping was available to anyone with a web browser

And thus this doesn’t fix the unfortunate precedent in the Power case, but at least it limits it from getting worse, while making it clear that scraping public web pages is not hacking, even if you’re sent a cease-and-desist letter.

Filed Under: , , ,
Companies: hiq, linkedin

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law”

Subscribe: RSS Leave a comment
12 Comments
TheTechRobo says:

Hm, does this mean I can still get sued for writing a program to scrape services like Google Classroom?

I’ve been working on a tool to scrape a Google Classroom class with their internal API (I’m making it for myself, because I don’t trust the completeness of Takeout, but I might open-source it). That’s not public data, because you’d need to pass in your cookies. Would I be able to be sued for that, then?

Leave a Reply to MathFox Cancel reply

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...