Anonymous Coward

September 8, 2025 at 11:02 am

Ugh, it just gets worse and worse still, doesn’t it?

Anonymous Coward

September 8, 2025 at 11:18 am

robots.txt have been created to limit what search engine are allowed to crawl (mostly to prevent some irrelevant pages, like sign-in pages, from being listed in search engine results).
When search engines became wildly used, and Google the main page of internet, most robots.txt have became less strict on forbidden content, but also forbidding other search engines (Bing has a long history of impersonate Google, or even crawling Google, to be able to create its own index).
If LLMs become a wider used tool (and it seems it starts going this way), websites would let them more openly, just as there is so much search engines today than when Google started.
The main difference is that LLMs doesn’t produce much traffic to websites (when they can remember the sources), still hallucinate greatly from the crawled content, and the few major AI companies receive tremendous amount of cash to crawl the web when small websites crawled may struggle to pay the bills.

Anonymous Coward

September 8, 2025 at 11:20 am

What are people supposed to do though?

Traffic isn’t free, and in some cases Ai training is equivalent to a ddos.

And those using Ai scrapping behave exactly like bad actors.

So should anyone hosting a website just bend over and pay the cost all the scrapping uses?

Maybe, just maybe, we should take the ai fuckers and whip them for refusing to even be civil instead of having everyone else bend over to them.

Bloof (profile)

September 8, 2025 at 11:23 am

Don’t blame the victims, blame the people hammering smaller sites on the scale of a DDoS attack while refusing to even contemplate obeying the rules that built the open internet. It isn’t knee jerk when the attack has been ongoing for years now and the costs of running anything is going up while income from advertisements and views from actual humans is decreasing.

The tech giants have created the wall that is destroying the internet, AI results have been erected to stand as a wall between people googling or bing-ing something and external sites to stop users going elsewhere and to obliterate the amount of money they have to pay people who run their adverts. What choice have the tech giants left people but to block them by any means necessary given all that they take while trying to give nothing in return?

MrWilson (profile)

September 8, 2025 at 11:52 am

Re:

The thing is, everyone is a victim, not just the sites getting hammered. The value of the internet is its openness. You can always route around one obstacle to your own education and freedom by finding another path, but if it all starts to get walled off, the utility and thus the freedom goes down. It’s important not to kill the internet to “save” it.

Aegrescit medendo.

Anonymous Coward

September 8, 2025 at 11:55 am

Re:

Lots of abusers claim to be “victims”. That doesn’t make it okay.

I’m a victim of this “robot”-blocking. I went to check for a new OpenWRT release, and was instead told to prove I was human. Their “downloads” sub-domain is still accessible, at least. I can’t read Linux kernel mailing list messages on the web either, but they still provide “public-inbox” archives.

We’re basically told to take everyone’s word that robot traffic is harmful, although we have little actual data. I have my doubts. As has been pointed out here, there was a lot of work 25 years ago to support 10,000 clients at a time; but that work was finished a decade later, by which time a single server could do about 1-10 million. How many crawlers are there?

I think it’s more likely that people squandered decades of computing-power gains and just got away with it till more crawlers appeared. But if anyone’s to compete with Google and Bing, we need more crawlers. Another big project to improve efficiency would be better overall than wasting everyone’s electricity to prove their “humanity” (by running Javascript… which is apparently a thing humans do better than robots? Like that’s gonna stop people with billion-dollar data centers. They’re already getting around this shit.)

Bloof (profile)

September 8, 2025 at 12:33 pm

Re: Re:

Are you seriously calling people fighting to stop what is effectively a continual DDoS attack, being done by billion dollar companies hell bent on cutting them off from any potential traffic, abusers? Get a grip.

Anonymous Coward

September 8, 2025 at 1:09 pm

Re: Re: Re:

If cancer could speak, it would make similar complaints about chemotherapy being so bad for the body.

Anonymous Coward

September 8, 2025 at 12:38 pm

Re: Re:

Good lord. This is why I miss fuckinggoogleit.

https://www.theregister.com/2025/08/21/ai_crawler_traffic/

https://news.designrush.com/80-percent-of-web-traffic-is-bots-the-hidden-cost-of-ai-scraping

https://news.ycombinator.com/item?id=45105230

https://www.404media.co/ai-scraping-bots-are-breaking-open-libraries-archives-and-museums/

“As has been pointed out here, there was a lot of work 25 years ago to support 10,000 clients at a time; but that work was finished a decade later, by which time a single server could do about 1-10 million. ”

This is just a complete misunderstanding. Your car can do 100 mph, why don’t you drive that constantly everywhere? Yes. Things can be built to scale out and handle incredible load, BUT THAT IS NOT FREE. In whatever field you work in, how many customers, widgets, or whatever can one person handle at a time? It’s infinite right? No. A person can only do so much in x amount of time and then you need to hire more, then you need managers, then you need a new building, and then.. Computers work the same way, and people have budgets.

“I think it’s more likely that people squandered decades of computing-power gains and just got away with it till more crawlers appeared.”

Go run a website you make no money off of, lets see how long you can afford it.

Anonymous Coward

September 8, 2025 at 4:22 pm

Re: Re: Re:

Your 404 Media link is paywalled, and the Ycombinator link is just another link to the Register story. The Fastly report mentions 39,000 requests per minute, which doesn’t seem like a huge number for a large site; only 650 per second. And that was one instance on one site, not a common thing.

Given how often regular people are accused of being bots—it’s happened to my grandparents who have no non-standard browser settings at all (and, having no idea what to do, couldn’t use that news site anymore)—I have to seriously question those numbers anyway. Design Rush references “proprietary data”. Fastly references “heuristics” and classifies 87% of bot traffic as “malicious”, which leaves at most 13% as “A.I.” (lumped in with search engines, the Internet Archive, and other such things). But it says 90% of “A.I.” traffic is Meta, Google, and OpenAI. If they can identify those crawlers, which I think do identify themselves, why not just block them, and leave the humans alone? That’d take “A.I.” traffic from 13% of all bot traffic to 1.3%.

This just seems like another unjustified “A.I. freakout”, though: focus on the 13% supporting the desired narrative, and nevermind the 87% that is account cracking, “ad fraud”, and so on. And there’s no comparison to historical numbers. Before Google took over, we had dozens of search engines crawling the web constantly; on weaker servers with slower connections and fewer “unlimited traffic” options. Is the current situation worse?

Anonymous Coward

September 8, 2025 at 6:44 pm

Re: Re: Re:²

… Please, shutup. You clearly know and understand NOTHING and have zero interest in learning.

“The Fastly report mentions 39,000 requests per minute”

Do you have a brain? Have you ever thought that not all tasks are equal?

Go run around the world, it’s just as as as clapping your hands!

“If they can identify those crawlers, which I think do identify themselves, why not just block them, and leave the humans alone? ”

GO FUCKING READ! For fucks sake, the only thing worse them a trump loving pedo is the willfully stupid.

Anonymous Coward

September 8, 2025 at 8:06 pm

Re: Re:

i weep for you.

Rich Kulawiec

September 8, 2025 at 2:47 pm

Re:

This. Although I’ll point out that it’s not equivalent to a DDoS attack, it is a DOoS attack.

I’ve spent decades studying attacks and abuse, and this is one of the worst I’ve ever seen. It’s massive, it’s relentless, and the people behind it simply don’t care what they destroy. They’re using every creative/duplicitous trick in the book to avoid being held accountable and to evade countermeasures. That’s why there are all kinds of public and private anti-AI-crawler projects, large and small, that should never have been necessary — but ARE necessary, because these attacks are knocking sites off the air and costing a lot of people a lot of time, money, effort, lost sleep, and everything else.

Don’t blame the victims of the sociopathic greedy thugs at AI companies for this mess. They could have chosen to play in nice in the sandbox, in the spirit of collaboration that we used to build the Internet. But no. They decided to be complete assholes, so they — and you, and everyone else — should not be surprised that we’ve decided not to put up with this nonsense.

Anonymous Coward

September 8, 2025 at 8:08 pm

Re: Re:

Ah, the words of someone highly expert which (and whom) i can respect.

ECA (profile)

September 8, 2025 at 11:43 am

I remember the Old search engines.

I could find Tons of things that NOW you Cant. And I would have links if the last 2 computers had saved all the links to the net.
I can find many things Similar on YT, Now, but they are being BEATEN ON DAILY.
How many OLD Video sites are still around? Like Daily motion?(dont go there), And other that have been in cout so many times they CANT MAKE MONEY unless they HIDE.
I mentioned something(that really wont work) that there are 15 nations NOT dealing with Copyrights or IP. AND not acknowledging International Copyrights. And how it would be interesting to put up Porn sites and Other BANNED sites in those nations. LIKE Copies of Movies and music, that have Been MONITIZED to the point you CANT OWN ANYTHING, and Companies can Actually REMOVE, without Notice. Data they they placed and SOLD to you.(how many people LIKE ITUNES, NOT).

Paul B

September 8, 2025 at 12:07 pm

Re: Good news and bad news

Good news, I think this poster is in fact human. Bad news, the word salad of CAPS and (parentheses) makes things incomprehensible.

On the bright side I cant find you on Google.

Anonymous Coward

September 8, 2025 at 8:10 pm

Re: Re:

They are human and a long-time resident of the commentariat. Mostly agreeably incoherent.

Ben (profile)

September 9, 2025 at 5:07 am

Re: Re: Re:

Mostly agreeably incoherent.

I can’t tell whether they’re agreeable or not as I cannot read their word salad & capitalisation nightmare-fuel comments.

Arianity (profile)

September 8, 2025 at 12:08 pm

But, increasingly, we’re seeing people have such a negative, knee-jerk, anti-AI stance that they may be shutting off access to the web in a manner that could lead to the death of an open web

You’re calling it knee-jerk. But a lot of this stuff is in direct response to things those AI companies themselves are doing, like not respecting robots.txt. For instance, in your example between a crawler and a user agent. The problem is AI companies will literally lie about being user agent, and crawl it anyway. (This is, not coincidentally, a big part of why AI companies are interested in making browsers. They get to do local queries that look like you)

AI companies have been maximally shitty stewards of the open web. This is what happens when you have wide scale irresponsible use. This doesn’t even get to ethics on training, it’s “stop pounding my server into dust and costing me money I can’t afford”. Fundamentally, part of having an open web is that people need to use it responsibly, and AI companies aren’t. You can’t have an open system that’s dependent on hostile users. This isn’t new, if you had an abusive crawler in the past you’d get blocked, too.

You have some (including us at Techdirt!) who don’t mind it when AI scanners crawl and learn from our content, so long as they don’t take down our servers.

Yeah, the problem is they are in fact doing that for a lot of people’s servers. And even when it doesn’t take it down, you’re paying for it. How much are you ok with paying, especially when the same crawler hits you repeatedly instead of caching? And what about smaller sites that can’t afford it? AI crawling is killing little sites, too.

But we should address those concerns through mechanisms that preserve internet openness rather than destroy it. That might mean new business models, better attribution systems, or novel approaches to creator compensation.

If we’re going to tell people to nerd harder, a good place to start might be AI companies respecting the commons. You can’t complain about people leaving the pool after you piss in it.

ECA (profile)

September 8, 2025 at 12:18 pm

Re: AT LEAST

Put up the PRivacy act..
A REAL protection of the data and personal info.
YOU CAN be tracked, by anyone, if they know how the system work..
At least when we had Hard wired Phones All they got as an Address. From the Phone book.

Anonymous Coward

September 8, 2025 at 12:43 pm

Re:

This.

It’s like that gop lawsuit and investigating into gmail spam blocking. If you do not want to be treated like a bad actor then don’t act like one.

Anonymous Coward

September 8, 2025 at 8:11 pm

Re: Re:

Oho. This. This, indeed, and most emphatically.

firestarter (profile)

September 8, 2025 at 12:25 pm

Another issue with AI scraping that’s overlooked is that it completely fucking hammers the people’s compute resources.

I run a website and I have been dealing with the onslaught of AI scrapers for months, they reduce the hardware I’m running the site on to skin and bones. They intentionally try to make themselves blend in with other users. I’ve had to create incredibly complex rulesets and stuff, completely ban lots of crawlers, make users on certain ISPs and even entire countries fill out a captcha before accessing the website, etc, to be able to alleviate the load that these scrapers put on the website. The end result is that my users now have a way worse experience and many of them have to fill out captchas.

Another website I use quite frequently that allows people to make freedom of information requests in New Zealand is also suffering from AI scrapers. For months its been incredibly slow and sometimes even completely down at times, and recently the admins put up a notice saying the websites issues are being caused by the load AI scrapers are putting on their resources.

These AI scrapers are the devil and are completely killing the Internet by making it incredibly difficult for independent website operators to be able to run without being concerned about this nonsense.

Rich Kulawiec

September 8, 2025 at 3:09 pm

Re:

“Another issue with AI scraping that’s overlooked is that it completely fucking hammers the people’s compute resources.”

Yes.

Even those of us with a lot of experience in performance tuning — at the network level, at the web server level, at the OS level, etc. — are finding that we can’t maintain previous service levels without large investments of time and money. Because of the thugs at AI companies.

And even when we do that — as I did last year when I replaced a server that was working perfectly fine pre-AI-crawlers with one that cost three times as much — the respite is only temporary. The crawlers effective negated all that money and all the accompanying work in just a few months. Because of the thugs at AI companies.

Other people that I work with, collaborate with, or just correspond with have gone through similar things. A lot of them aren’t trying to expand their sites or improve them because they can’t — all of their resources are going into just trying to survive. And just as I discovered, they know that anything they add will just make their sites a bigger target. Because of the thugs at AI companies.

Libraries, museums, archives, and other resources that are perpetually starved for money, places where people work out of dedication to the concept, not because they’re going to get rich, are being decimated. Long-standing resources in science and technology are asking for help that they never needed before. Because of the thugs at AI companies.

We don’t need new business models, we don’t need to nerd harder, we don’t need any of that crap. What we need is for the thugs at these AI companies to behave like (at least) minimally decent human beings. There’s no tech fix if they don’t…

…although there may be a legal one. There are ongoing discussions of a massive class-action lawsuit. It’s unclear that’ll go anywhere, but I certainly would applaud it, provided it was for at least $1T — and that, by the way, is likely a serious underestimate of the aggregate cost of all this.

Anonymous Coward

September 8, 2025 at 2:02 pm

this is a firm, firm no, Mike.

I’m sorry, but letting AI abuse the everloving hell out of websites, after they have exploited the everloving hell out of our economy and copyright in the first place? No.

The dollar cost to the world from running AI is astronomical on top of it, so this is literally asking the victims why they won’t stop hitting themselves.

We need a significant tax on power and space from AI usage in datacenters, and arguably a ban on AI from nonscientific sources entirely

https://searchengineland.com/google-web-thriving-dying-461653 is as clear as day on both the source and the outcome.

Anonymous Coward

September 8, 2025 at 8:16 pm

Re:

And when they inevitably implode, the VC boyz are gonna make everyone else feel it.

VJGoh

September 8, 2025 at 2:56 pm

The so-called open web

What a horrendous take.

The promise of the open web was made in quite literally a different era. I had my first webpage in the 90s, and it was hosted on a University server where I was a student. Then there was geocities and other similar sites. For a long time, the open web was predicated on the fact that text pages have almost no bandwidth costs and Universities could act as repositories of useful knowledge, or that small ads could keep servers running. A few pennies here, a few pennies there, and eventually you were talking about real money.

Now it’s just Facebook and Google making the lion’s share of the ad money, and their revenue is declining. AI scrapers have all of the downsides of search engine browsing with none of the upsides: they’re bandwidth intensive, but they bring you no traffic at all. If you have any sort of ad support, you’re guaranteed no clickthrough. If you’re relying on traffic to drive engagement and possibly a subscription or a patreon payment, you’re super boned–the LLM isn’t going to bring someone to your site, they’re just gonna gobble up your content and you’ll never see any benefit.

AI is the death of the open web because the model doesn’t work anymore. “Oh no,” you’re saying, “if AI can’t scrape the web, the information is kept out of the hands of people!” You’ve forgotten that if nobody can afford to keep their websites running, the information ALSO disappears.

When there’s a shared pasture, farmers will come together and enforce the usage because otherwise we see the tragedy of the commons. But that particular case was less frequent than you’d expect because people want to get along with their neighbours, and everyone can be made to understand what the common good is.

But faceless corporations do not care about getting along with you. They will take your information, eat your bandwidth, drive you off the web and never look back. They haven’t even tried to come up with accommodations, they just consume endlessly and try to sell your own content back to you as masticated slop.

Anonymous Coward

September 10, 2025 at 8:23 pm

Re:

You really enjoy keying your own car to prevent vandalism too, right?

Anonymous Coward

September 8, 2025 at 3:25 pm

“Why does everyone want to fight against the Tragedy being inflicted upon this Commons??”

Anonymous Coward

September 8, 2025 at 3:53 pm

Jeez, with how much criticism you’re getting, Mike, I’m wondering if you’re alright?

Mike Masnick (profile)

September 8, 2025 at 4:20 pm

Re:

Jeez, with how much criticism you’re getting, Mike, I’m wondering if you’re alright?

Hmm? I see some criticism of the post, but nothing particularly harsh. It’s an opinion piece, and I fully expected some people to disagree. I still stand by the piece and think that it raises key points missed by many of the critics, but why should I not feel alright?

Anonymous Coward

September 8, 2025 at 4:41 pm

Re: Re:

I was just wondering.

Stephen T. Stone (profile)

September 8, 2025 at 5:22 pm

Re: Re:

Maybe they’re someone who takes criticism extra-personally and therefore equates criticism to, I’unno, being stabbed in the gut or something.

Anonymous Coward

September 8, 2025 at 7:53 pm

Re: Re: Re:

Hahahahahaha…. No.

I’m not that guy, though I’d admit I act like that.
I understand everyone has different opinions, but there’s 26 (at the time) comments that seemed to criticize him.

Luckily, he expected the criticism.

Still, that was funny.

Anonymous Coward

September 8, 2025 at 8:19 pm

Re: Re: Re:

Or just the normal troll behavior of suggesting one may or should not be all right, largely directed to the public, but if it pokes presumed target, that’s a bonus.

Anonymous Coward

September 8, 2025 at 7:04 pm

Re: Re:

I’m wondering how you would feel if your website got ddosed and you had to pay for the server load. I think there is some seriously lack of understanding of how websites, or more accurately these days, web applications, work and how expensive, or just dangerous it can be to just let all traffic through all the time.

The overall point this piece makes is bend over and take it. It ignores the history and context of the overall issue. We wouldn’t be here had ai companies decided that following the rules, the law, or even just etiquette. Ai companies purposely decided to hide who they were and what they were doing for years just so people couldn’t refuse them. Then they decided that the systems setup previously they would just ignore. Then they even went so far as to just straight up pirate stuff and start committing theft because working with people above board was just too hard and slow.

So no, there are no good points, except maybe that this is the same type of absolute horse shit that those who defend abusers say. You could replace this article with one about a wife hurting her husband because she left him over his abuse.

Mike Masnick (profile)

September 8, 2025 at 9:13 pm

Re: Re: Re:

I’m wondering how you would feel if your website got ddosed and you had to pay for the server load. I think there is some seriously lack of understanding of how websites, or more accurately these days, web applications, work and how expensive, or just dangerous it can be to just let all traffic through all the time.

I mean, I run THIS website, and yes, on occasion we have gotten DDoSed, whether on purpose or not, and we figure out ways to block it and move on.

The overall point this piece makes is bend over and take it.

It suggests no such thing. I am curious what you read, because it was not this piece.

It ignores the history and context of the overall issue.

I repeat. It suggests no such thing.

Ai companies purposely decided to hide who they were and what they were doing for years just so people couldn’t refuse them. Then they decided that the systems setup previously they would just ignore. Then they even went so far as to just straight up pirate stuff and start committing theft because working with people above board was just too hard and slow.

I mean, most of that is either untrue or misleading, which makes me question why you feel the need to opine on something you appear to know little about.

So no, there are no good points, except maybe that this is the same type of absolute horse shit that those who defend abusers say. You could replace this article with one about a wife hurting her husband because she left him over his abuse.

You do not seem particularly connected to reality.

Arianity (profile)

September 8, 2025 at 10:20 pm

Re: Re: Re:²

I mean, most of that is either untrue or misleading, which makes me question why you feel the need to opine on something you appear to know little about.

What parts of that are untrue or misleading?

Anonymous Coward

November 18, 2025 at 7:56 pm

Re: Re: Re:

That’s fucking harsh.
Geez.

Also:
We wouldn’t be here had ai companies decided that following the rules, the law, or even just etiquette.
👆
The sad but true part here. 😔

Matthew S. Smith

September 8, 2025 at 4:41 pm

Agreed

I’m broadly on board with your take Mike. And more than a little saddened to look around at people in my generation, who also grew up with the open web, seemingly decide that everything we did then was wrong and that, in fact, we should not allow easy, free, open access to information, that we should actually abandon broad access to information and instead sway towards tighter copyright, age verification, and a wide variety of blocks and checks of various reasons, because maybe it will hurt an AI company somewhere (except, based on everything I’ve seen, there’s an essentially zero percent change it will).

Anonymous Coward

September 8, 2025 at 8:25 pm

Re:

There areat leasttwo things you are (intentionally?) conflating there. One is the stupid human reactions to whatever, which are as stupid as AI companies – no good side of the supposed two there. Then there are those who are literally just trying to defend their networks and compute; and surely it is a given that all are not doing so optimally, but who the hell could reasonably expect that?

urza9814

September 8, 2025 at 7:32 pm

Access controls are not optional

So if AI should go ahead and ignore robots.txt in response to a user’s query, should it also look for bugs in login prompts so it can get through those as well?

If you want to argue that site admins shouldn’t configure those files for blocking AI I could maybe be convinced; I haven’t bothered to put any on my websites…but deliberately ignoring access control mechanisms configured by the webmaster, especially at scale and as what seems to be as a matter of corporate policy, is absolutely not OK. Pretty sure that’s a federal crime here in the US under the CFAA. The fact that it’s merely a robots.txt file doesn’t seem to matter; the law only refers to “exceeding authorized access”, and from what I’ve seen it seems pretty clear that the AI scrapers are DEFINITELY doing that.

A big part of the reason there is so much collateral damage is because so many of the AI companies are deliberately ignoring the orders not to scrape sites. So the admins have to take a much heavier approach. If you just say “No AI”, they ignore you and sometimes even change user agents and such to pretend to be something else. The AI companies are intentionally ruining it for everyone else. But you’d rather blame literally anyone else in order to defend the AI companies’ industrial-scale criminal vandalism of the web…

Anonymous Coward

September 8, 2025 at 7:52 pm

Yes, the Cloudflare/Perplexity thing is as stupid as everyone else who “proves” their work is “stolen” by repeatedly prompting AI to look it up.

But you know what? An increasingly AI-mediated web is already not an open web. Never mind the external costs – since you never do anyway – with absolutely zero value as a result. So maybe the web has to die a closed death so we can reinvent it when we grow the fuck up a little.

We don’teven have to put garbage in anymore – though we certainly do! – to get the innovation of hot garbage out.

pcc (profile)

September 8, 2025 at 10:47 pm

Copy-Paste the article into the AI prompt instead of the link

Sure, it’s a pain, but it works. I may write a script to do this, or better yet, search for one, because someone like me has probably already done so. Soon it will be a simple yet popular app to make everyone’s life easier.

cls

September 9, 2025 at 1:10 am

open web blackout day

Hey, remember the SOPA blackout day, January 18, 2012.

https://en.m.wikipedia.org/wiki/Protests_against_SOPA_and_PIPA

Maybe we should organize a day like that, sites that are suffering from ai crawling ddos, all go black together.

Raise awareness and make it hurt!

Lynn

September 9, 2025 at 8:25 am

It's blowing my mind that there's an argument here

Basically every argument in this thread: AI is basically a DDOS attack, loaded with actively engaged high income customers I don’t want! I refuse to draw a distinction between training spiders and lucrative users who DARE use AI to find products and services. I only want customers to come in the old fashioned way. The nerve of these people. They’re the ones that are destroying the internet. Talk to them about it!

urza9814

September 9, 2025 at 1:22 pm

Re:

So I can go ahead and break into your home at night — or even burn the whole house to the ground — as long as I’m considering making a large enough purchase from you at some later date? THAT is your counter-argument??

If the admins don’t want the AIs browsing their site, they have every right to put up those NO TRESPASSING signs. If the billion dollar AI companies choose to ignore those signs, that’s a federal crime here in the US. It doesn’t actually matter how much money these “users” (who are not actually visiting or using the site in question) might hypothetically be willing to pay at some undetermined future date. While folks like Trump certainly like to think that throwing around enough cash means you’re fully above the law, everyone who isn’t a billionaire generally agrees that it really shouldn’t work that way. (And everyone who is a lawyer agrees that the law doesn’t work that way…it’s just that corrupt cops and prosecutors often refuse to enforce it properly. Which is not something that most people defend.)

Bloof (profile)

September 9, 2025 at 3:35 pm

Re:

They aren’t customers, they are coming in uninvited and taking whatever they find for free so they can erect a wall between actual humans and the sites they are battering with their continual scraping attempts, and deprive them of actual human visitors and advertising revenue. Google are aware what they are doing is strangling the life out of the open internet, but they’re doing it anyway… But sure, it’s the victims of this that are the bad guys for pushing back.

Anonymous Coward

September 11, 2025 at 10:21 am

They aren’t customers, they are coming in uninvited and taking whatever they find for free…

Just like I do. Gonna block me too?

bob

September 13, 2025 at 6:55 am

interesting

i am on the sidelines to all this as a web designer and host of a few small sites. I am not seriously effected as my income is not directly connected to traffic but I run several small aws lightsail instances and they have been wiped off the face of the earth, no humans, just crawlers of some sort. i really didn’t understand it, but chatgtp has helped me at least get my burst level to below maximum. the bots effectivly made my sites unusable for over a year and even now i am only available via a bunch of cloudflare rules i dont understand. i will expand if this post works, save writing lots and finding i need an account

bob

September 13, 2025 at 7:27 am

more interesting

i am only commenting as this feels like the end really and people might be interested in my position. it is however slightly grim or controversial. i run several sites, one on war, and another few on true crime, none of which are pc sanitised, but none of which are gross. i operate ethically, but i dont think the powers that be like me. straight off, i run black kalendar dot nl, i doubt you’ll find it in google as it cant index it for some reason. this is not a major issue as i was never about traffic and it means my site is still there and i can work on it. it has about 40k cases that i wrote over 15 years or so. its very unusual and i imagine a target for ai crawlers. i have responded by setting all sorts of rules which i dont understand resulting in google not being able to index it. i also chopped off all content beyond seconf

tag. some stories were 100k words or longer. my situation with this is that AI bots were in theory stealing all my content. however, the irony is that many cant reproduce it because thay sanitise crime information. I have tried question cases in chatgtp and it knows nothing about them. so i get hammered and none of the information is used directly. however, my only fear is that someone could quickly reproduce my 15 years of work. Much of the information come from secure archive data that you have to access physically, one case at a time, so only I have it. I have converted some of it into books, but now, so can other people. so what you have here is a small party that can barely afford a simple 4gb ram lamp stack on aws being effectivly destroyed by bots that don’t care. I sort of dont care as its more a ‘labour of love’, although its certainly not ‘love’, and not attached to income diectly, although I do sell books. part of my point is that I am the sole supplier of much of this information which is very unique and its clear that suddenly anyone can use it via a prompt. I also do a war site, moderwar dot games and the situation is exactly the same, red lined on the cpu burst for over a year. there preople subscribe to play. I have had no complaints, but I dont really have any users, maybe 1 or 2 at a time. However, the bots don’t care, its a living ddos attack. unlike the crime site in wich i fear my proprietry information was being used, there is little value in the information here, tgey are all simply games. I have used chatgtp to recover cpu cost, including my bad coding and am now back afloat. a third example was a customer online shop and his 40k products were a honey pot for bots. I simply didn’t understand what was happening beyond the fact that his site was constantly going offline. it truly soured relations permanently and now he is out of business. so from my perspective ai has destroyed the web. it makes hit counters hard to work with as its all bots. if anything gets any serious size, the cost of the traffic becomes an issue, when dealing with customers now you need to factor in the impact of ai and they think your an amatuer if you have doubts or are vague. they dont want vps solutions and so small sites that have a lot of content tend to get hammered, so much so that people don’t bother. i truly believed in the open web, thats why I did my crime site, it was old school and with no newsletters or ads or anything, it was totally free for innocent humans to read, but the bots destroy bandwith and the possibility that people could replicate the whole thing at a click is such that I have removed virtually all the content to the first to

tags. this is a big blow. suffice to say, I am not too upset. the upside is that I am, whilst continuing, pivoting part of the sites aspect to feeding the ai. Chatgtp tells me that the future is robot web site with schema to feed the ai systems and I am working on structured data based on statistics and playing down on cases as available data. end of it is that things are changing. I never thought I would restrict my content as I have but after 15 years I have no choice.

brianary (profile)

October 19, 2025 at 12:40 pm

A suprisingly bad take

I would not have bet on Mike Masnick railing against the basic protections available via the ancient robots.txt standard.

Serene Apps (profile)

October 22, 2025 at 10:37 am

Possible Solution

A solution I’ve toyed with is to require particular crypto wallet address to access your content. I did not say crypto, I said a crypto wallet.

Firstly, it would initially block all bots, crawlers and AI. Secondly, if you know all your visitors are crypto-savvy it helps with building future content/products/offerings.

Eventually, let’s AI starts having its own wallets to use. Fine. Now you add an agreement that says if it’s an LLM and it’s logging in to use your content they must pay you via your receiving wallet address xyz amount of xyz crypto. They have it, you can can receive it, and now AI has a way to actually pay people for what they steal.

If they steal it without paying you, theorhetically you now have a legal case. But long term it’s in AI’s best interest to pay people some crypto for their content, otherwise – the content will go away. People for the most part will not keep producing content for now reward.

Saturday
12:00	There's One Week Left In The Public Domain Game Jam! (1)
Friday
19:39	Techdirt 2025: The Stats (15)
15:28	White House Push AI-Altered Images Of Arrested ICE Protesters To Manufacture Cruelty (33)
13:42	A Year In, And It's Time To Recognize: The Oval Office Is Empty (18)
12:16	Noem Says ICE Is Being Menaced By Ice Cubes, Protesters Should Be Cooped Up In 'Free Speech Zones' (17)
10:35	Got Ideas For Growing The Open Social Web? Bring Them. (5)
10:30	Daily Deal: Luminar Mobile for iOS And Android (0)
09:21	ICE Is So Bad At Immigration Enforcement That It's Detaining Native Americans (22)
05:24	RFK Jr. Spreads New Bogus Scare Mongering Bullshit About Cell Phone Safety (17)
Thursday
20:03	'The Perfect Season' Trademark: IU Would Have To License The Phrase From The Patriots (7)

We’re Walling Off The Open Internet To Stop AI—And It May End Up Breaking Everything Else

from the how-open-is-it? dept

Comments on “We’re Walling Off The Open Internet To Stop AI—And It May End Up Breaking Everything Else”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Saturday

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More