Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law

from the phew dept

Thu, Apr 21st 2022 10:42am - Mike Masnick

For years now we’ve been following cases related to scraping data off of websites and the Computer Fraud and Abuse Act (CFAA). The CFAA is an extremely poorly drafted law, that has been stretched by both law enforcement and civil plaintiffs alike to argue that all sorts of things are “unauthorized access” and therefore hacking. We’ve covered many of these cases over the years. The courts have at least started to push back on some of the more extreme interpretations of the law, though it’s still problematic.

Over a decade ago, we followed a case that I still think is one of the most problematic rulings for the internet: when Facebook sued a small startup called Power.com. Power made a social media aggregator, allowing you to access all your different social media accounts through one interface and even to post messages across multiple platforms through that single interface. In order to do that, you had to provide your login to Power, which would access your social media accounts, suck out the data (or push in the data for posting). Again, this was the user willingly granting their login information. Leaving aside whether or not it’s wise to share your login info with a third party, it was still the user’s choice.

However, Facebook decided that this was hacking and in violation of the CFAA… and the courts (tragically) agreed, allowing Facebook to effectively shut down a useful service that would have prevented Facebook from locking up so much data (and becoming such a dominant player). The key reason the court sided with Facebook was it claimed that once Facebook sent a cease-and-desist letter, that effectively mean that any further scraping was “unauthorized.” I still think that we’d see an extremely different competitive landscape today if the Power case had turned out differently. It would have significantly limited the ability of the big social media players to lock in their users. Instead, the rule more or less turned Facebook into a roach motel where your data checks in, but it can never check out.

Other internet companies unfortunately followed suit, using similar lawsuits against websites providing useful complementary services. Craigslist went after 3taps, which made Craigslist data available to third party apps. LinkedIn went after a company called HiQ that was scraping and making use of LinkedIn data. Here, unlike the Power case, the courts actually ruled against LinkedIn saying that LinkedIn could not use the CFAA to block scraping of public data. The key difference between this case and the Power one was that HiQ was scraping public info (i.e., it didn’t need to log in to LinkedIn with someone’s info to access the data). LinkedIn appealed… and lost again. LinkedIn then asked the Supreme Court to weigh in, resulting in the Supreme Court vacating the 9th Circuit’s ruling and sending it back to the court to reconsider in light of last summer’s big Van Buren ruling that limited parts of the CFAA.

So now, with yet another chance… the 9th Circuit has correctly concluded the same thing. HiQ’s scraping of public information still does not violate the CFAA. There are a few different legal issues involved here, but the CFAA claims are the main event. LinkedIn argued that it sent a cease-and-desist to HiQ, so as per the Power ruling, its continued scraping violated the law.

The panel reviewing this case goes deep into the CFAA, why it exists, and what it’s supposed to do before concluding that LinkedIn’s interpretation can’t be the correct one, noting that “the CFAA is best understood as an anti-intrusion statute and not as a ‘misappropriation statute,'” and as such accessing public information shouldn’t be a violation.

Put differently, the CFAA contemplates the existence of three kinds of computer systems: (1) computers for which access is open to the general public and permission is not required (2) computers for which authorization is required and has been given, and (3) computers for which authorization is required but has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part of the system accessed). Public LinkedIn profiles, available to anyone with an Internet connection, fall into the first category. With regard to websites made freely accessible on the Internet, the “breaking and entering” analogue invoked so frequently during congressional consideration has no application, and the concept of “without authorization” is inapt.

As for reconsidering in light of the Van Buren ruling, that doesn’t change things.

Van Buren’s “gates-up-or-down inquiry” is consistent with our interpretation of the CFAA as contemplating three categories of computer systems

[….]

Van Buren’s distinction between computer users who “can or cannot access a computer system,” suggests a baseline in which there are “limitations on access” that prevent some users from accessing the system (i.e., a “gate” exists, and can be either up or down). The Court’s “gates-up-or-down inquiry” thus applies to the latter two categories of computers we have identified: if authorization is required and has been given, the gates are up; if authorization is required and has not been given, the gates are down. As we have noted, however, a defining feature of public websites is that their publicly available sections lack limitations on access; instead, those sections are open to anyone with a web browser. In other words, applying the “gates” analogy to a computer hosting publicly available webpages, that computer has erected no gates to lift or lower in the first place.17 Van Buren therefore reinforces our conclusion that the concept of “without authorization” does not apply to public websites.

The court again distinguishes Power from the HiQ case by saying that Facebook limited access to the data to only those who were logged in, as opposed to the more public access available on LinkedIn.

In that case, Facebook sued Power Ventures, a social networking website that aggregated social networking information from multiple platforms, for accessing Facebook users’ data and using that data to send mass messages as part of a promotional campaign. Id. at 1062–63. After Facebook sent a cease-and-desist letter, Power Ventures continued to circumvent IP barriers and gain access to password protected Facebook member profiles. Id. at 1063. We held that after receiving an individualized cease-and-desist letter, Power Ventures had accessed Facebook computers “without authorization” and was therefore liable under the CFAA. Id. at 1067–68. But we specifically recognized that “Facebook has tried to limit and control access to its website” as to the purposes for which Power Ventures sought to use it. Id. at 1063. Indeed, Facebook requires its users to register with a unique username and password, and Power Ventures required that Facebook users provide their Facebook username and password to access their Facebook data on Power Ventures’ platform. Facebook, Inc. v. Power Ventures, Inc., 844 F. Supp. 2d 1025, 1028 (N.D. Cal. 2012). While Power Ventures was gathering user data that was protected by Facebook’s username and password authentication system, the data hiQ was scraping was available to anyone with a web browser

And thus this doesn’t fix the unfortunate precedent in the Power case, but at least it limits it from getting worse, while making it clear that scraping public web pages is not hacking, even if you’re sent a cease-and-desist letter.

Gerardo

April 21, 2022 at 10:55 am

Should help Missouri reporter

This should help that Missouri case, of the St. Louis Post-Dispatch reporter, who just looked at the HTML source.

Hope their governor looks at this and desists of calling that “hacking”

MathFox

April 21, 2022 at 11:04 am

Re: That would look bad

It would look bad on the governor if he admits that the government published that private information. He’ll continue calling those reporters hackers, rather than admitting government incompetence.

Anonymous Coward

April 21, 2022 at 3:57 pm

Re:

That case was fortunately dropped. The prosecutors declined the prosecution.

Something tells me this is a good thing but its impact in future cases of a similar merit might be limited.

Anonymous Coward

April 21, 2022 at 11:25 am

So yeah, login to everywhere with Facebook, but never login to Facebook from anywhere else. Totes legit.

ECA (profile)

April 21, 2022 at 12:41 pm

ping pong

Back to CompuServe and Prodigy.,

Dale

April 21, 2022 at 1:31 pm

What's the difference between 'data' and 'content'??

Because if I can scrape data from a public website and use it, manipulate it, etc., then why can’t I scrape data from a public broadcaster and maniplulate it and use it?? (especially if it’s only for my own consumption)

Rocky

April 22, 2022 at 5:02 am

Re:

The difference is that data and facts aren’t generally copyrightable but content is insofar that the content consists of more than just the data/facts.

So it is permissible to scrape a websites content to get at any data and facts if it’s publicly available.

Bobvious

April 21, 2022 at 3:30 pm

Welcome to the Facebook California

where your data checks in, but it can never check out.

And last summer’s big Van Buren, https://www.youtube.com/watch?v=KBjehZLzisw

Naughty Autie

April 21, 2022 at 3:32 pm

So is this basically a reversal of the asinine decision that killed Scroogle ten years ago?

TheTechRobo

April 22, 2022 at 5:10 am

Hm, does this mean I can still get sued for writing a program to scrape services like Google Classroom?

I’ve been working on a tool to scrape a Google Classroom class with their internal API (I’m making it for myself, because I don’t trust the completeness of Takeout, but I might open-source it). That’s not public data, because you’d need to pass in your cookies. Would I be able to be sued for that, then?

glenn

April 22, 2022 at 9:25 am

In other news…

Macy’s covers their store windows so that no one can see their displays.

Anonymous Coward

April 22, 2022 at 11:02 am

So then Clearview can....

….legally scrape the web for photos. Doth the plot thicken?

Friday
10:45	Net Neutrality Is Back! For Now. (0)
10:40	Daily Deal: U-STREAM Home Streaming Studio with 10" Ring Light & Tripod (1)
09:20	Biden Bans The App His Campaign Insists Is An Important Place To Talk To Voters (6)
05:21	People Are Slowly Realizing Their Auto Insurance Rates Are Skyrocketing Because Their Car Is Covertly Spying On Them (25)
Thursday
20:05	Flynn Family's SLAPP Suit Against CNN Slapped Down By Judge (17)
15:31	Two Decades Of Content In 'Garry's Mod' Taken Down, Possibly By Nintendo Impersonator (23)
13:50	French Collection Society Wants A Tax On Generative AI, Payable To Collection Societies (11)
12:02	Ninth Circuit: 5th Amendment Doesn't Cover Compelled Production Of Fingerprints To Unlock A Phone (20)
10:46	The Problems Of The NCMEC CyberTipline Apply To All Stakeholders (7)
10:41	Daily Deal: Little Wonder Solo Stereo Multi Connect Bluetooth Speaker (1)

Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law

from the phew dept

Comments on “Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law”

Should help Missouri reporter

Re: That would look bad

Re:

ping pong

What's the difference between 'data' and 'content'??

Re:

Welcome to the Facebook California

Hm, does this mean I can still get sued for writing a program to scrape services like Google Classroom?

So then Clearview can....

Leave a Reply to MathFox Cancel reply

Comment Options:

What's this?

The Techdirt Greenhouse

Trending Posts

Friday

Thursday

More

Tools & Services

Company

Contact

More

Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law

from the phew dept

Comments on “Appeals Court Says That Scraping Public Data Off A Website Does Not Violate Hacking Law”

Leave a Reply to MathFox Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

The Techdirt Greenhouse

Trending Posts

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More