Google Built Its Empire Scraping The Web. Now It’s Suing To Stop Others From Scraping Google
from the the-open-web-is-closing dept
Last week, Google filed suit against SerpApi, a scraping company that helps businesses pull data from Google search results. The lawsuit claims SerpApi violated DMCA Section 1201 by circumventing Google’s “technological protection measures” to access search results—and the copyrighted content within them—without permission.
There’s just one problem with this theory: Google built its entire business on scraping the web without asking permission first. And now it wants to use one of the most abused provisions in copyright law to stop others from doing something functionally similar to what made Google a tech giant in the first place.
The lawsuit comes on the heels of Reddit’s equally problematic anti-scraping suit from October—which we called an attack on the open internet. Reddit sued Perplexity and various scraping firms (including SerpApi), claiming they violated 1201 by circumventing… Google’s technological protections. Reddit was mad it had cut a multi-million dollar licensing deal with Google for access to Reddit content, and these firms were routing around both that deal and Google itself to provide similar results to users. The legal theory was bizarre: Reddit didn’t own the copyright on user posts, and the scrapers weren’t even touching Reddit directly—yet Reddit claimed standing to sue based on circumventing someone else’s TPMs.
So now, Google has filed its own, similar lawsuit, going after SerpApi directly, focused on how SerpApi gets around its attempts to block such scraping. Google released a blog post defending this lawsuit:
We filed a suit today against the scraping company SerpApi for circumventing security measures protecting others’ copyrighted content that appears in Google search results. We did this to ask a court to stop SerpApi’s bots and their malicious scraping, which violates the choices of websites and rightsholders about who should have access to their content. This lawsuit follows legal action that other websites have taken against SerpApi and similar scraping companies, and is part of our long track record of affirmative litigation to fight scammers and bad actors on the web.
Google follows industry-standard crawling protocols, and honors websites’ directives over crawling of their content. Stealthy scrapers like SerpApi override those directives and give sites no choice at all. SerpApi uses shady back doors — like cloaking themselves, bombarding websites with massive networks of bots and giving their crawlers fake and constantly changing names — circumventing our security measures to take websites’ content wholesale. This unlawful activity has increased dramatically over the past year.
SerpApi deceptively takes content that Google licenses from others (like images that appear in Knowledge Panels, real-time data in Search features and much more), and then resells it for a fee. In doing so, it willfully disregards the rights and directives of websites and providers whose content appears in Search.
Look, SerpApi’s behavior is sketchy. Spoofing user agents, rotating IPs to look like legitimate users, solving CAPTCHAs programmatically—Google’s complaint paints a picture of a company actively working to evade detection. But the legal theory Google is deploying to stop them threatens something far bigger than one shady scraper.
Google’s entire business is built on scraping as much of the web as possible without first asking permission. The fact that they now want to invoke DMCA 1201—one of the most consistently abused provisions in copyright law—to stop others from scraping them exposes the underlying problem with these licensing-era arguments: they’re attempts to pull up the ladder after you’ve climbed it.
Just from a straight up perception standpoint, it looks bad.
To be clear: this isn’t about defending SerpApi. They appear to be bad actors who built a business on evading detection systems. The problem is that Google chose to go after them using a legal weapon with a long history of collateral damage. When you invoke Section 1201 against web scraping, you’re not just targeting one sketchy company—you’re potentially rewriting the rules for how the entire open web functions. The choice of weapon matters, especially when that weapon has been repeatedly abused to stifle legitimate competition and could now be turned against the very openness that made the modern internet possible.
For many years, we’ve discussed the many, many problems of DMCA Section 1201. It’s the “anti-circumvention” part of the law that says merely any attempt to get around a “technological protection measure” (or even just tell someone else how to get around a technological protection measure) could be deemed to violate the law, even if the TPMs in question were wholly ineffective, and even if the intent in getting around the TPM had nothing to do with copyright infringement.
That has lead to years of abusive practices by companies who would put silly, pointless “TPMs” in place just in order to be able to use the law to limit competition. There were lawsuits over printer ink cartridges and garage door openers, among other things.
Here, Google is saying that it put in place a TPM in January of 2025 called “SearchGuard” (which sounds like an advanced CAPTCHA of some sort) to prevent SerpApi from scraping its search results, but SerpApi figured out a way around it:
When SearchGuard launched in January 2025, it effectively blocked SerpApi from accessing Google’s Search results and the copyrighted content of Google’s partners. But SerpApi immediately began working on a means to circumvent Google’s technological protection measure. SerpApi quickly discovered means to do so and deployed them.
SerpApi’s answer to SearchGuard is to mask the hundreds of millions of automated queries it is sending to Google each day to make them appear as if they are coming from human users. SerpApi’s founder recently described the process as “creating fake browsers using a multitude of IP addresses that Google sees as normal users.”
SerpApi’s fakery takes many forms. For example, when SerpApi submits an automated query to Google and SearchGuard responds with a challenge, SerpApi may misrepresent the device, software, or location from which the query is sent in order to solve the challenge and obtain authorization to submit queries. Additionally or alternatively, SerpApi may solve SearchGuard’s challenge with a “legitimate” request and then syndicate the resulting authorization, that is, share it with unauthorized machines around the world, to enable their “fake browsers” to generate automated queries that appear to Google as authorized. It also uses automated means to bypass CAPTCHAs, another aspect of SearchGuard that tests users to ensure they are humans rather than machines.
Getting around these protections eats up Google’s resources, and sure, that must be annoying for Google. But the real motivation shows up when Google gets to the economics of the situation. Google has started cutting licensing deals with content partners—most notably the multi-million dollar Reddit deal—and now those partners are pissed that SerpApi lets others access similar data without paying anyone:
For Google, SerpApi’s automated scraping not only consumes substantial computing resources without payment, but also disrupts Google’s content partnerships. Google licenses content so that it can enhance the Search results it provides to users and thereby boost its competitive standing. SerpApi undermines Google’s substantial investment in those licenses, making the content available to other services that need not incur similar costs.
SerpApi’s scraping of Google Search results also impacts the rights holders who license content to Google. Without permission or compensation, SerpApi takes their content from Google and widely distributes it for use by third parties. That, in turn, threatens to disrupt Google’s relationship with the rights holders who look to Google to prevent the misappropriation of the content Google displays. At least one Google content partner, Reddit, has already sued SerpApi for its misconduct.
This is where the 1201 theory becomes genuinely dangerous. Google’s argument, if accepted, provides a roadmap for any website operator who wants to lock down their content: slap on a trivial TPM—a CAPTCHA, an IP check, anything—and suddenly you can invoke federal law against anyone who figures out how to get around it, even if their purpose has nothing to do with copyright infringement.
The implications spiral outward quickly. If Google succeeds here, what stops every major website from deciding they want licensing revenue from the largest scrapers? Cloudflare could put bot detection on the huge swath of the internet it serves and demand Google pay up. WordPress could do the same across its massive network. The open web—built on the assumption that published content is publicly accessible for indexing and analysis—becomes a patchwork of licensing requirements, each enforced through 1201 threats.
That doesn’t seem good for the prospects of a continued open web.
Google’s legal theory has another significant problem: the requirement that a TPM must “effectively control” access. Just last week, a court rejected Ziff Davis’s attempt to turn robots.txt into a 1201 violation when OpenAI allegedly ignored its crawling restrictions. The court’s reasoning is directly applicable here:
Robots.txt files instructing web crawlers to refrain from scraping certain content do not “effectively control” access to that content any more than a sign requesting that visitors “keep off the grass” effectively controls access to a lawn. On Ziff Davis’s own telling, robots.txt directives are merely requests and do not effectively control access to copyrighted works. A web crawler need not “appl[y] . . . information, or a process or a treatment,” in order to gain access to web content on pages that include robots.txt directives; it may access the content without taking any affirmative step other than impertinently disregarding the request embodied in the robots.txt files. The FAC therefore fails to allege that robots.txt files are a “technological measure that effectively controls access” to Ziff Davis’s copyrighted works, and the DMCA section 1201(a) claim fails for this reason.
Google will argue SearchGuard is different—it’s more than a polite request, it actively challenges and blocks scrapers. But if SerpApi can routinely bypass it by spoofing browsers and rotating IPs, does it really “effectively control” access? Or is it just a slightly more sophisticated “keep off the grass” sign that determined actors can ignore?
This question matters enormously because it determines whether the statute that was supposed to prevent piracy of CDs and DVDs now also governs every attempt to access publicly-available web pages through automated means.
For decades, we’ve operated under a system where robots.txt represented a voluntary, good-faith approach to web crawling. The major players respected these directives not because they had to, but because maintaining that norm benefited everyone. That system is breaking down, not because of SerpApi, but because of the rise of scrapers focused on LLM training, mixed with other companies wanting to find licensing deals to get a cut of the money flows. Reddit and Google negotiating licensing deals over open web content was a warning sign of all of this, and now it’s spilling out into the courts with questionable 1201 claims.
Both Reddit and Google frame this as protecting the open internet from bad actors. But pulling up the ladder after you’ve climbed it isn’t protection—it’s rent-seeking. Google built an empire on the assumption that publicly accessible web content could be freely scraped and indexed. Now it wants to rewrite the rules… using Hollywood’s favorite tool to block access to information.
The real problem isn’t that Google is fighting back against SerpApi’s evasive tactics. It’s that they chose to fight using a legal weapon that, if successful, fundamentally changes how we understand access to the open web. Section 1201 has already been wildly abused to stifle competition in everything from printer cartridges to garage door openers. Extending it to cover basic web scraping because SerpApi seems sketchy threatens the foundational assumption that published web content is accessible for indexing, research, and analysis.
Google has the resources to solve this problem through better engineering or by raising the actual cost of evasion high enough that SerpApi’s business model fails. Instead, they’ve opted for a legal shortcut that, if it works, will reshape the internet in ways that go far beyond one sketchy scraping company.
The internet is changing, and legitimate questions exist about how web scraping should function in an era of large language models and AI training. But those questions won’t be answered well by stretching copyright law to cover something it was never designed for, and empowering every website operator to demand licensing fees simply by putting up a CAPTCHA.
That’s not protecting the open web. That’s closing it.
Filed Under: 1201, anti-circumvention, circumvention, copyright, dmca 1201, licensing, open web, robots.txt, webcrawling
Companies: google, reddit, serpapi