An Only Slightly Modest Proposal: If AI Companies Want More Content, They Should Fund Reporters, And Lots Of Them

from the so-stupid-it-could-work? dept

In Jonathan Swift’s “A Modest Proposal,” he satirized politicians who were out of touch and were treating the poor as an inconvenience, rather than a sign of human suffering and misery. So, he took what seemed like two big problems, according to those politicians, and came up with an obviously barbaric solution to solve both problems: by letting the poor sell their kids as food. This really only was designed to highlight the barbaric framing of the “problem” by the Irish elite.

But, sometimes, there really are scenarios where there are two very real problems (not of a Swiftian nature) that might actually be in a position to be combined such that both problems are actually solved. And thus I present a non-Swiftian modest proposal: that AI companies desperate for high quality content should create funds to pay for journalists to create high quality content that the AI companies can use for training.

Lately, there have been multiple news articles about how desperate the AI companies are for fresh data to feed the voracious and insatiable training machine. The Wall Street Journal noted that “the internet is too small” for AI companies.

Companies racing to develop more powerful artificial intelligence are rapidly nearing a new problem: The internet might be too small for their plans.

Ever more powerful systems developed by OpenAI, Google and others require larger oceans of information to learn from. That demand is straining the available pool of quality public data online at the same time that some data owners are blocking access to AI companies.

Some executives and researchers say the industry’s need for high-quality text data could outstrip supply within two years, potentially slowing AI’s development.

The problem is not just data, but high-quality data, as that report notes. You need the AI systems trained on well-written, useful content:

Most of the data available online is useless for AI training because it contains flaws such as sentence fragments or doesn’t add to a model’s knowledge. Villalobos estimated that only a sliver of the internet is useful for such training—perhaps just one-tenth of the information gathered by the nonprofit Common Crawl, whose web archive is widely used by AI developers.

The NY Times also published a similar-ish story, though it framed it in a much more nefarious light. It argued that the AI companies were “cutting corners to harvest data for AI” systems. However, what the Times actually means is that AI companies believe (correctly, in my opinion) that they have a very strong fair use argument for training on whatever data they can find.

At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by The Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.

I’ve discussed the copyright arguments repeatedly, including why I think the AI companies are correct that training on copyright-covered works shouldn’t be infringing. I also think the rush to rely on copyright as a solution here is problematic. Doing so would only enrich big tech, since smaller companies and open source systems wouldn’t be able to keep up. Also, requiring all training to be licensed would effectively break the open internet, by creating a new “license to read.” This would be bad.

But, all of this is coming at the same time that journalism is in peril. We’re hearing stories of news orgs laying off tons of journalists. Or publications shutting down entirely. There are stories of “news deserts” and how corruption is increasing as news orgs continue to fail.

The proposed solutions to this very real problem have been very, very bad. Link taxes are even more destructive to the open web and don’t actually appear to work very well.

But… that doesn’t mean there isn’t a better solution. If the tech companies need good, well-written content to fill their training systems, and the world needs good, high-quality journalism, why don’t the big AI companies agree to start funding journalists and solve both problems in one move?

This may sound similar to the demands of licensing works, but I’m not talking about past works. Those works are out there. I’m talking about paying for the creation of future works. It’s not about licensing or copyright. It’s about paying for the creation of new, high-quality journalism. And then letting those works exist freely on the internet for everyone.

It was already mentioned above that Meta considered buying a book publisher. Why not news publishers as well? But ownership of the journalists shouldn’t even be the focus, as it could raise some other challenges. Instead, they can just set up a fund where anyone can apply. There can be a pretty clear set of benefits to all parties.

Journalists who join the programs (and they should be allowed to join multiple programs from multiple companies) agree to publish new, well-written articles on a regular basis, in exchange for some level of financial support. It should be abundantly clear that the AI companies have no say over the type of journalism being done, nor do they have any say in editorial beyond the ability to review the quality of the writing to make sure it’s actually useful in training new systems.

The journalists only need to promise that anything they publish that receives funding from this program is made available to the training systems of the companies doing the funding.

In exchange, beyond just some funding, the AI companies could make a variety of AI tools available to the journalists as well, to help them improve the quality of their writing (I have a story coming up soon about how I’ve been using AI as a supplemental editor, but never to write any content).

This really feels like something that could solve at least some of the problems at both ends of this market. There are some potential limits here, of course. The AI companies need so much new content that it’s unclear if this would create enough to matter. But it would create something. And it could be lots of somethings. And not only that, but it should be pretty damn up-to-date somethings (which can be useful).

There could be reasonable concerns about conflicts of interest, but as it stands today, most journalism is funded by rich billionaires already. I don’t see how this is any worse. And, as suggested, it could be structured such that the journalists aren’t employees, and it could (should?) have explicit promises about a lack of editorial control or interference.

The AI companies might also claim that it’s too expensive to create a large enough pool, but if they’re so desperate for good, high-quality content, to the point of potentially buying up famous publishers, then, um, it seems clear that they are willing to spend, and it’s worth it to them.

It’s not a perfect solution, but it sure seems like one that solves two big problems in one shot, without fucking up the open web or relying on copyright as a crutch. Instead, it funds the future production of high-quality journalism in a manner that is helpful both for the public at large and the AI companies that could contribute to the funding. It also doesn’t require any big new government law. The companies can just… see the benefit themselves and set up the program.

The public gets a lot more high-quality journalism, and journalists get sustainable revenue sources to continue to do good reporting. It’s not quite a Swiftian modest proposal, in that… it actually could make sense.

Filed Under: , , , , , , , ,
Companies: google, meta, openai

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “An Only Slightly Modest Proposal: If AI Companies Want More Content, They Should Fund Reporters, And Lots Of Them”

Subscribe: RSS Leave a comment
39 Comments
Anonymous Coward says:

Respectfully, I question the effectiveness of this proposal.

If I’m an AI company in search of quality content to train my AI on, and I’m willing to pay journalists to make said content, wouldn’t I just lay off said journalists once I’m satisfied that they’ve provided enough content for my AI? As my AI gets more quality data to train off of, my need for those journalists will gradually decline until I no longer consider funding them to be worth it.

Likewise, if I’m a journalist, my content would be used to train a machine that is ultimately cheaper than me and can produce far more content than me–and thus could force me to seek a different line of work. If quality’s the only advantage I have over an AI, why would I deprive myself of it? Also, why should I trust an AI company to keep me around once they’re satisfied that their AI’s content is of sufficient quality?

Anonymous Coward says:

Re:

You’re assuming that an AI company is going to be satisfied with one model. We’ve already seen GPT-4.

Besides which, are AIs really going to interview people? Research print records? Drill down to find the truth in a situation? Those seem like awfully hard things for a glorified language database to accomplish.

Anonymous Coward says:

Re: Re:

It is not satire, as evidence by the lines “But, sometimes, there really are scenarios where there are two very real problems (not of a Swiftian nature) that might actually be in a position to be combined such that both problems are actually solved. And thus I present a non-Swiftian modest proposal”

Arianity says:

Re: Re:

You are aware this is satire,

The article seems to be explicitly saying it’s not: But, sometimes, there really are scenarios where there are two very real problems (not of a Swiftian nature) that might actually be in a position to be combined such that both problems are actually solved. And thus I present a non-Swiftian modest proposal:

Kind of a weird framing to invoke Swift if it’s not actually satire, though.

James Burkhardt (profile) says:

Re: Re: Re:

In respect to the narrative about AI, that the Capitalists investing in AI are not simply looking for tools to replace people, funding journalism is the obvious solution to a need for more and better quality training data.

In respect to the actual motivations of profit driven Capitalists investing in AI, the purpose is to replace journalists with lower cost/article LLMs spewing ‘content’ out at impossibly high rates to feed low effort click farming. In this world, as the original poster highlighted, the entire idea is a satire. An absolute joke. Its not a serious proposal in any sense, because paying journalists is the opposite of the goal of the use of AI. If adopted it would, as implied by the original comment, result in the journalism world cannibalizing itself, journalists eating their own future to survive today. (Possibly a reason to invoke swift? just maybe?)

This proposal is not swiftian satire. Swift was not seriously proposing eating babies, and that should be obvious to all audiences. Swift was exaggerating the callous attitudes toward Ireland to subject that attitude to ridicule. But not all satire operates this way. Some satire uses irony and the ironic contrast between an obvious serious solution and the barriers imposed by the reality of profit driven motives similarly serves to expose the gulf between stated motives and the actualities.

Anonymous Coward says:

Re:

  1. The AI company isn’t promising lifetime employment. But in many industries worker-bees already know they can be fired the next time the boss has a bad hair day.
  2. Most journalists are unlikely to worry too much about their future AI overlords. There are so many factors that seem to drive journalists out of business, and if AI can keep them afloat a bit longer, what’s the harm.
Bloof (profile) says:

Kind of feels like the concept of trickle down economics, they could use the wealth they’ve accumulated to benefit all mankind/fund people to make quality content to fuel the machine… They won’t, we know they won’t, this never happens in the real world and no amount of think pieces suggesting it is going to make it so. The entire point of AI is to eliminate human jobs and create a new set of kings at the expense of everyone who works in a creative field, they’re not even trying to hide that fact.

The goal of AI is to do it cheaper, not better, no matter the cost to humanity.

Anonymous Coward says:

Re:

The goal of AI is to do it cheaper

No, it isn’t. Companies are in business to make a profit. If they can make a profit, and do some good at the same time, so much the better. AI companies think they’re making something useful, that people will be willing to pay for. Presumably, AI will be used by companies to increase efficiency, but to say that’s AIs goal is totally misunderstanding things.

Anonymous Coward says:

Re: Re:

” If they can make a profit, and do some good at the same time, so much the better.”

Most business is not going to try to make things better.

“AI companies think they’re making something useful,”

haha, good one. The usefulness is not a concern.

“AI will be used by companies to increase efficiency”

No it will not. Like most new shiny objects, it will loose its luster ending up in the attic next to the thigh master and quadraphonic.

Anonymous Coward says:

Re: Re: Re:

“AI companies think they’re making something useful,”

Haha, good one. The usefulness is not a concern.

Actually, usefulness is a concern because AI companies know if what they make isn’t useful, they’re more than likely to lose money to a competitor whose software people do find useful.

Bloof (profile) says:

Re: Re:

Welcome back from your 500 year coma, we have this thing called Wikipedia now, while not perfect you can skim through an get a good glimpse at how capitalism has gone in the time you’ve been asleep… Not good, as it happens. Not a lot of time and effort put into doing good, and the people super into AI gutting creative fields now are really pissed at anyone so much as perceived as trying to do the right thing by other humans, to the point they use the terms for equality programs to cover for slurs. Funny how that goes.

Anonymous Coward says:

The problem with the plan of “hire a lot of high-quality journalists” is that there have never been a lot of high-quality journalists.

I direct your attention Knoll’s Law: “everything you read in the newspapers is absolutely true, except for the rare story of which you happen to have firsthand knowledge”. There was no golden age where journalism was high-quality and reliable. Even ostensible industry leaders like the NYT, Washington Post and Wall Street Journal routinely pushed complete nonsense and/or propaganda long before the internet came along.

The reason newspapers and local TV news have taken a big hit from the internet isn’t because of big tech meanies. It is because the actual business of newspapers and local TV news has always been “producing thinly-researched swill to draw in eyeballs for the advertisers”, and these days you can get thinly-researched swill for free.

Anonymous Coward says:

The problem is not just data, but high-quality data, as that report notes. You need the AI systems trained on well-written, useful content.

So what you’re saying is, feeding the output of other AI systems into your AI system isn’t going to cut it.

And here I was expecting an AI Oroboros: AI feeding AI feeding AI until it all collapses into a single word: Nazi.

Anonymous Coward says:

Let's do some math:

According to this reuters article, big tech will pay

Between 5 cents and $1 dollar per photo and more than $1 per video
The market rate for text is $0.001 per word

https://www.reuters.com/technology/inside-big-techs-underground-race-buy-ai-training-data-2024-04-05/

But instead, consider the price for quality journalism. You can create words for basically free these days, but to gather real human experiences into words, at $0.001 per word? I doubt I could string together $100 worth of income, or 100,000 words per day, at those rates. Reddit says that the average journalist writes around ~1000 words a day. Now, it sounds like this will be high quality data, but on the other hand, you’re going to be paying a hefty price per word of training data.

While this sounds like a great idea at first, I’m not sure if this scheme would make sense for either AI companies or journalists. AI companies may be rich, but I don’t know if they are rich enough to subsidize the entire cultural industry.

On the other hand, some back of the envelope math says that AI might increase GDP by ~7% according to Goldmann Sachs, compared to the ~2.3% that the cultural industries currently occupy. I don’t know how much of that increase will be captured by AI companies in the form of fees and stuff, but I think it would be fair to funnel half of that to the creation of new content, much as YouTube has its famous 55% revenue split.

Arianity says:

The journalists only need to promise that anything they publish that receives funding from this program is made available to the training systems of the companies doing the funding.

Isn’t there a major issue here? If training falls under fair use, competitors can just use it to train their own models.

It seems like what would happen is that AI companies would hire journalists to write journalism in private, and never publish it publicly. The only way to ensure a competitor doesn’t free ride on it is to make sure it never sees the light of day.

Also, requiring all training to be licensed would effectively break the open internet, by creating a new “license to read.”

It is entirely possible to construct a legal regime where there is a license to train, without creating a license to read.

Doing so would only enrich big tech,

This seems like it would have the same issue, since you’d need deep pockets.

This may sound similar to the demands of licensing works, but I’m not talking about past works. Those works are out there. I’m talking about paying for the creation of future works. It’s not about licensing or copyright. It’s about paying for the creation of new, high-quality journalism

I can’t say I see a meaningful distinction here, but if it makes people happy… sure, I guess? This feels like copyright/licensing by another name.

Anonymous Coward says:

What a load of absolute drivel.

Mike has obviously been instructed by his overlords at Google and Andreesen to viciously oppose any effort by publishers to enforce their copyrights against AI model trainers, or pass any kind of clarification to fair use in Congress.

But even Mike knows that he can’t insult and slam other policies 100% of the time…he needs to propose his own alternative at least once or twice a year.

So his idea is for big tech companies to pay a living wage to journalists to produce 2000 words a day, not compelled to by the government (obviously) but out of the goodness of their heart? Doesn’t he know how ridiculous this sounds?

Mike’s thinly veiled corporate shill policy ideas are the worst kind of joke: it would be really funny if it weren’t so damn depressing.

Anonni says:

why don’t the big AI companies agree to start funding journalists and solve both problems in one move? - Definitely not.

Definitely not – because they will end up owning the journalist and the journalist will write content with a view to continuing to be funded.
A better solution is to
– tax AI companies as a levy on their income, thus taxing the users of AI who are presumably not using AI for love and who are being charged by the AI companies in any event.
– charge AI companies for the access to copyright material on a basis of negotiated access terms, including pricing.

Material that is subject to copyright and is issued on the baisi of a fee-for-access is often made available at favorable terms or even free, depending on the usage to which the material is to be put e.g. academic research.

Copyright is itself a mechanism to allow for chargin users of copyright material for it’s use. The original copyright holder, be it an author, or be it an institution which itself employs the author, obtains an income for it’s use.

If copyright means anything, it means that the copyright holder gets to decide who has access to copyrighted material. AI companies should therefore be obliged to obtain a license to access the copyrighted material if that is in the copyright license under which the material is made available.

Depending on the copyright license, some material will be made available free of charge. That is the prerogative of the copyright holder as to the license he/she/they adopt(s) for his/her/their material.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...