340 Local News Outlets Now Blocking The Internet Archive

from the history-is-now-a-black-hole dept

Fri, Jun 5th 2026 11:08am - Karl Bode

Earlier this year Nieman Lab broke the story that major news publishers, including The New York Times, The Guardian, and USA Today Co., had started blocking the Internet Archive for fear that AI companies might scrape the nonprofit’s repositories for training data. As one of the last bastions of archival history, that is, in case you’re not aware, not very good for the public interest.

Four months later and Nieman Lab now notes that the number of news outlets blocking the archive has soared to around 340 organizations:

“Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today Co., McClatchy, Advance Local, MediaNews Group, and Tribune Publishing. The latter two are both subsidiaries of the “vulture hedge fund” Alden Global Capital.”

Many of these localities are already effectively news deserts, where most real local journalism was hollowed out and replaced by a smattering of local right wing broadcasters (like Sinclair Broadcasting) or a hedge fund run “local newspaper” that doesn’t do much in the way of actual local reporting. That’s generally also been terrible for informed consensus or shedding a light on local corruption.

Some of the outlets blocking internet archive access have legitimate concerns about protecting their hard work from being repackaged and resold without compensation or citation. But an awful lot of the folks grumbling about the Internet Archive were never in the journalism business to serve the public interest in the first place.

Regardless of motivation, hiding whatever local news remains behind paywalls, then blocking it from the Internet Archive, in turn makes it harder for everyone else to do real journalism that relies on the historical record, local journalists tell Nieman Lab:

“I cover news within a larger news desert in New York’s Rockland, Sullivan, and Rockland counties. This means I need to heavily rely on archival data of old news articles from now deceased, or zombie-fied, media outlets,” wrote B.J. Mendelson, the editor of The Monroe Gazette newsletter, in one recent petition signed by over 200 journalists. “Without the Internet Archive, my [work] would be incredibly difficult to do.”

Trying to address publisher concerns, the folks at the Wayback Machine have highlighted ongoing efforts to minimize abuse of the site, including restrictions on bulk downloading and collaborating with Cloudflare to monitor bot activity.

But even beyond AI scraping, many corporate media owners simply can’t see beyond the narrow interests of paywalled revenue. And corporate power — and authoritarianism — sometimes in collaboration — both tend to benefit from a misinformed electorate that doesn’t have a firm grip on the lessons learned from historical experience, and doesn’t have easy access to the factual record.

As a journalist of several decades, the vast vast majority of my work has been deleted by website owners and companies that simply couldn’t have cared any less about archival history or any sort of permanent record. My explorations of telecom policy have disappeared, but Verizon, AT&T, and Comcast’s version of the historical record generally remains. You can probably see how that’s of benefit to corporate power.

But again, smaller, independent, local news outlets on fixed budgets have particularly legitimate concerns about the tech giants’ plan to hijack and repackage the entirety of their work using AI without any compensation or attribution whatsoever. The Internet Archive folks say they are listening to those concerns, while also trying to train news orgs on archival preservation:

“In December, the Internet Archive partnered with the Poynter Institute and Investigative Reporters and Editors to train a cohort of 33 local and national news outlets on how to develop and implement an archiving strategy. The initiative, funded through a Press Forward grant, aims to train 300 newsrooms in digital preservation and in using the Internet Archive’s services by the end of 2027.”

Some other archival efforts exist, but they often involve paywalled access; again a problem when you’ve got an authoritarian corporate coalition driven heavily by free propaganda, while factual reality and what’s left of intelligent U.S. analysis and journalism sits hidden behind a monthly subscription fee.

Comments on “340 Local News Outlets Now Blocking The Internet Archive”

Anonymous Coward

June 5, 2026 at 11:42 am

We run an archive

We’ve been forced to put a wall in front of it — a free one, but still a wall — because the web crawlers operated by AI companies not only took everything regardless of permissions/copyright, but they kept hitting it thousands of times at high speed from locations all over the Internet/world, thus creating a DDoS attack.

If you’re about to write “why didn’t you…?” I know. I’m intimately familiar with defenses against attacks and abuse, including sharing co-credit for inventing one. I know pretty much every possible way to defend an online operation and I know pretty much everything about those methods. The way we chose was the last option we wanted, we did everything possible to avoid using it — at considerable trouble and expense — but it’s the only way that works.

So don’t blame the IA. Put the blame squarely where it belongs: on sociopathic assholes like Sam Altman and Mark Zuckerberg and Elon Musk et.al.

And by the way: this isn’t an accident. They want this, because every free archive, every unpaywalled news operation, every open web site, is a giving away for free what they want to charge for.

Anonymous Coward

June 5, 2026 at 3:40 pm

Re:

So don’t blame the IA. Put the blame squarely where it belongs: on sociopathic assholes like Sam Altman and Mark Zuckerberg and Elon Musk et.al.

…because their bots will be able to read the news via archive.org while avoiding the denial-of-service effect you mention? It seems like you’re muddling the message by mentioning that.

Nunya

June 6, 2026 at 7:06 am

Re: Re:

Get fucked

Anonymous Coward

June 9, 2026 at 5:45 am

Re: Re: Re:

The truth sure hurts, huh?

Anonymous Coward

June 5, 2026 at 12:07 pm

I noticed yesterday that NYT’s hostility to users is now so bad that archive.is can’t capture it reliably. Archive.org is into legality and respectability and Archive.is is run by one single Russian asshole who can’t keep up. We need a new, rogue archival project that leverages residential IPs. If only that were in Anna’s remit.

Anonymous Coward

June 5, 2026 at 4:46 pm

Re:

There’s ghost archive.

Arianity (profile)

June 5, 2026 at 3:02 pm

Trying to address publisher concerns, the folks at the Wayback Machine have highlighted ongoing efforts to minimize abuse of the site, including restrictions on bulk downloading and collaborating with Cloudflare to monitor bot activity.

Unfortunately, it doesn’t seem to be as effective as they’d like to make it out to be. :/

And I’m not really sure how you fix it, AI companies pissing in the commons seems to have ruined it for everyone.

Epic_Null (profile)

June 5, 2026 at 4:15 pm

Re:

And I’m not really sure how you fix it

By criminally charging the AI companiws for everything they have done to the commons, to the tune of 2x or more of the revenue they made doing it.

Anonymous Coward

June 5, 2026 at 4:44 pm

Re: Re: True...

But wouldn’t the AI Companies ignore it and/or bribe the site owners? (Maybe wrong)

Of course, even then, I have a feeling that the IA blocking would’ve happened even without AI.

The Phule

June 5, 2026 at 4:54 pm

Re: Re:

How do you charge them a negative number? AI companies are bleeding investor capital still

Anonymous Coward

June 6, 2026 at 5:08 pm

Re: Re: Re:

I presume that’s why Epic_Null said “revenue” instead of “profits”. Or perhaps they know too much about Hollywood accounting.

Revenue is never a negative number, nor is it zero in relation to these companies—even if they’re losing money or “losing money”.

Commenter #5759 (profile)

June 5, 2026 at 3:12 pm

I wonder if these sites would agree to be archived on a time-delay. “Yes, you can make the copies now, but you can’t put them up on the website until 1/5/10 years from now.” It’s better than getting nothing.

Anonymous Coward

June 5, 2026 at 4:16 pm

Re:

I had considered much the same thing (ala “move ‘old news’ to a non-paywalled server/site).

But the top comment here describes that it’s not simply looting and pillaging, it’s an existential attack for any archive that doesn’t have some mechanism for blocking/slowing/denying AI crawlers.

Even large companies need to pay their server support fees.

Anonymous Coward

June 5, 2026 at 3:18 pm

Is there any doubt how capitalism would act? Dog eat dog.

Add Your Comment

Saturday
12:00	This Week In Techdirt History: July 26th - August 1st (0)
Friday
19:39	RFK Jr., Who Is Definitely Not Checked Out Of His Job, To Host A Cooking Show (6)
15:14	What Ukraine’s Battlefield Openness Can Teach Washington About AI Access (4)
13:12	Ctrl-Alt-Speech: Zuck Starts Throwing His Weights Around (0)
11:12	Federal Judges Chastise Trump’s Justice Department For “Unlawful,” “Unethical” And “Unseemly” Conduct (10)
11:07	Daily Deal: The Lifetime Learner Bundle (0)
09:31	Grand Jury Witness: Reflecting Pool Was Already Damaged Before Arrested Man Touched It (17)
05:25	Trump FCC Hilariously Bungles Chinese 'Drone Ban' (9)
Thursday
20:07	John Oliver Dares Buc-ee's To Sue Him Over Trademark Infringement (11)
15:19	What 20 Million Bans Reveal About The Stress On Wikipedia's Volunteers (7)

340 Local News Outlets Now Blocking The Internet Archive

from the history-is-now-a-black-hole dept

Comments on “340 Local News Outlets Now Blocking The Internet Archive”

We run an archive

Re:

Re: Re:

Re: Re: Re:

Re:

Re:

Re: Re: True...

Re: Re:

Re: Re: Re:

Re:

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Saturday

Friday

Thursday

More

Tools & Services

Company

Contact

More

340 Local News Outlets Now Blocking The Internet Archive

from the history-is-now-a-black-hole dept

Comments on “340 Local News Outlets Now Blocking The Internet Archive”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Saturday

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More