If Creators Suing AI Companies Over Copyright Win, It Will Further Entrench Big Tech

from the be-careful-what-you-wish-for dept

There’s been this weird idea lately, even among people who used to recognize that copyright only empowers the largest gatekeepers, that in the AI world we have to magically flip the script on copyright and use it as a tool to get AI companies to pay for the material they train on. But, as we’ve explained repeatedly, this would be a huge mistake. Even if people are concerned about how AI works, copyright is not the right tool to use here, and the risk of it being used to destroy all sorts of important and useful tools is quite high (ignoring Elon Musk’s prediction that “Digital God” will obsolete all of this).

However, because so many people think that they’re supporting creators and “sticking it” to Big Tech in supporting these copyright lawsuits over AI, I thought it might be useful to play out how this would work in practice. And, spoiler alert, the end result would be a disaster for creators, and a huge benefit to big tech. It’s exactly what we should be fighting against.

And, we know this because we have decades of copyright law and the internet to observe. Copyright law, by its very nature as a monopoly right, has always served the interests of gatekeepers over artists. This is why the most aggressive enforcers of copyright are the very middlemen with long histories of screwing over the actual creatives: the record labels, the TV and movie studios, the book publishers, etc.

This is because the nature of copyright law is such that it is most powerful when a few large entities act as central repositories for the copyrights and can lord around their power and try to force other entities to pay up. This is how the music industry has worked for years, and you can see what’s happened. After years of fighting internet music, it finally devolved into a situation where there are a tiny number of online music services (Spotify, Apple, YouTube, etc.) who cut massive deals with the giant gatekeepers on the other side (the record labels, the performance rights orgs, the collection societies) while the actual creators get pennies.

This is why we’ve said that AI training will never fit neatly into a licensing regime. The almost certain outcome (because it’s what happens every other time a similar situation arises) is that there will be one (possibly two) giant entities who will be designated as the “collection society” with whom AI companies will have to negotiate or to just purchase a “training license” and that entity will then collect a ton of money, much of which will go towards “administration,” and actual artists will… get a tiny bit.

And, because of the nature of training data, which only needs to be collected once, it’s not likely that this will be a recurring payment, but a minuscule one-off for the right to train on the data.

But, given the enormity of the amount of content, and the structure of this kind of thing, the cost will be extremely high for the AI companies (a few pennies for every creator online can add up in aggregate), meaning that only the biggest of big tech will be able to afford it.

In other words, the end result of a win in this kind of litigation (or, if Congress decides to act to achieve something similar) would be the further locking-in of the biggest companies. Google, Meta, and OpenAI (with Microsoft’s money) can afford the license, and will toss off a tiny one-time payment to creators (while whatever collection society there is takes a big cut for administration).

And then all of the actually interesting smaller companies and open source models are screwed.

End result? More lock-in of the biggest of big tech in exchange for… a few pennies for creators?

That’s not a beneficial outcome. It’s a horrible outcome. It will not just limit innovation, but it will massively limit competition and provide an even bigger benefit to the biggest incumbents.

Filed Under: , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “If Creators Suing AI Companies Over Copyright Win, It Will Further Entrench Big Tech”

Subscribe: RSS Leave a comment
62 Comments
Anonymous Coward says:

Is the alternative (re: not addressing copyright law with regards to AI) actually any better?

We’ve already seen companies limit API access, ostensibly because of cost, but simultaneously raising rates as the value of API access becomes more associated with AI outcomes.

We already see the largest, most influential tech companies able to far outstrip other competitors with AI, as they’re better able to parse the massive amount of data available.

Copyright currently exists to help the middlemen, not the artists, but is that an argument against applying copyright to AI, or an indictment of our copyright system in general?

Arguing that artists are already required to bend over backwards for major companies, thus copyright shouldn’t be enforced with regards to AI sounds like a bit of a backwards argument.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Somewhere along the line, copyright has expanded beyond the control of producing copies to controlling how people can use the contents of a work. Any generic form of usage licensing, such as performance licenses that venues are required to get to allow musical performance become a means of supporting another layer of parasites, and transferring money from the the poorer artists to the richer artists.

While get a license seems such a simple solution, it is hugely impractical, especially as self publishing exists. Just who keeps track of who owns what copyrights when tens or hundreds of thousands of works of new works are published every day. Any licensing schemes similar to the collection societies means that a new layer of parasites are created, and the publisher take their cut of the license fees where a few crumbs may make it to the richest creators that they have signed on. obscure creators, and self publishers will likely find their works covered by any licensing fee, but they will not see a penny of that income.

Anonymous Coward says:

Re: Re:

While get a license seems such a simple solution, it is hugely impractical, especially as self publishing exists. Just who keeps track of who owns what copyrights when tens or hundreds of thousands of works of new works are published every day.

“Getting a license is easy. If you won’t do that, you don’t deserve to use my work.”
“Okay, where’s the registration for the copyright you hold? If you don’t actually hold a registered copyright, on what grounds can you claim the copyright belongs to you?”
“Wait, not like that.”

Anonymous Coward says:

Re: Re: Re:

I know you are joking, but with a population of 333,287,557 (2022 census), and allowing photos and videos, along with created text and visual arts pieces, just how big would the copyright office have to be to handle registration of works being created in the US. A system designed to meet the needs of the legacy publishers, where it was easy to list the works published in a year in a couple of book sized catalogues, is not going to work in a world with easy self publishing and the flood of works that pre-Internet would never have been seen by more than a few family members.

cpt kangarooski says:

Re: Re: Re:2

with a population of 333,287,557 (2022 census), and allowing photos and videos, along with created text and visual arts pieces, just how big would the copyright office have to be to handle registration of works being created in the US.

Not that big. (Also, US censuses are decennial, in years evenly divisible by 10)

In 1970, the census determined the US population to be 203,302,031. Copyrights operated under the 1909 Copyright Act still, and so registration was essential to holding a copyright in a published work. Things seemed to work okay, and they didn’t even have the advantage of all of the modern computer equipment we have now.

According to the Copyright Office report from 1970, they had 316,465 registrations, only 23,549 renewal registrations in 1969, but I don’t think that they’ve ever had more than a few hundred employees.

Bringing back formalities, or even strengthening them to new levels, as I strongly believe should be done, as well as shortening terms but increasing renewals (ditto), would require more work by the Copyright Office, but a lot more of it could be automated than before. Renewals shouldn’t require human involvement at all on the administrative side, for example. (Much like registering and renewing domain names has gone from something done by two or three people at InterNIC to a big but thoroughly computerized industry, so long as there are no complaints or disputes requiring human interaction)

Remember, the issue isn’t the number of works being created; it’s how many of those works’ creators think that it’s worth the time, trouble, and money to bother to register. I bet you didn’t register your post that I’m replying to. I know I’m not going to bother to register this one.

If the author doesn’t care enough about getting a copyright to fill out a form, submit a couple-three best copies, and pay a nominal fee (just to avoid people spamming the system — a dollar would be enough for me), then why should anyone else care about granting them a copyright?

This self-selection works wonders to keep the number of registrations, and thus copyrights, to a manageable level — we just need to make copyrights contingent on registrations.

Anonymous Coward says:

Re: Re: Re:3

You are overlooking a big difference between the world pre-internet and post-Internet. Pre-Internet the works being registered for copyright purposes were those few works selected by the publishers, labels and studios. Those registered in any year could be catalogued is a few book size volumes, which for books at least was done. Post Internet, anybody can publish a work, and on a world wide scale, YouTube gas about 5oo hours worth of videos published every minute, say 400 works a minute. Instagram has about 64,000 photos published every minute. Most of those works are new works, whose copyright belongs to the creator, and that is only two places where new works are published.

Add in article length blog posts, music and books published in various place around the Internet and you have a problem that is bigger than any practical copyright registration system. Just the two example I have given are approaching 10 million new works being published a day. That figure should also tell you how little of human creativity was being published when publication had to go via a gate keeper.

cpt kangarooski says:

Re: Re: Re:4

I don’t think so.

Remember, I’m not saying that every published work should be registered. I’m saying that aside from a minor copyright on unpublished works (to avoid pre-publication piracy), copyrights should not be granted without registration. Only the copyright claimant should be allowed to decide whether to register, and to actually go through the registration process.

And the process should not be free; it should involve some effort by the registrant, and it should involve some monetary cost.

Most authors or other copyright claimants won’t bother — indicating that they were willing to create and publish a work without copyright acting as an incentive, and that therefore they are undeserving of copyrights. (Which should be reserved for situations where they are necessary for works to be created and published)

Further, require registration prior to publication — except for works created contemporaneously with publication, such as live performances — and expand the concept of what constitutes publication to include any public availability of the work, rather than of a copy, making performances and displays count too.

Anyone will be able to make, register, and then upload to YouTube their react video or unboxing (or whatever is trendy now; I don’t keep up with what the kids think is cool), or upload to their blog, or whatever. But I bet that by and large they will not bother to.

Which is fine — it was their choice, and now those works are immediately in the public domain. I doubt it will have much effect; who would want to pirate this stuff? And if there is a lot of piracy, well, I’m not willing to protect authors from making mistakes or bad deals any more than we protect anyone else from such things.

Maybe there’s 10 million new works every day, but I bet that if we strictly required formalities, there’d be only a few thousand registrations a day. Because when copyrights aren’t free, and aren’t automatic, and claimants need to think about it, and put in some effort and money to register, they think a lot harder about whether they really need a copyright or not.

And if a lot of works are registered, and we need to increase the staff of the Copyright Office, well, that’s why it’s part of the government. I don’t mind it being taxpayer supported. (The registration fee isn’t meant to support it — it’s meant to impose a minor but tangible hurdle on the claimant so that they don’t spam registrations)

Anonymous Coward says:

Re: Re: Re:5

congratulations, you have proposed a two class copyright system, where copyrights largely exists for works that go through a gatekeeper publisher, and leave everything else unprotected. Also, if somebody has an unprotected work become popular, you have created a problem when they want to protect it, and maybe sell rights, like the movie rights.Also, you would have eviscerated the creative commons and opensource/free software licenses.

cpt kangarooski says:

Re: Re: Re:6

congratulations, you have proposed a two class copyright system, where copyrights largely exists for works that go through a gatekeeper publisher, and leave everything else unprotected.

I am okay with that. Firstly, because it has basically worked out okay for the history of copyright since the Statute of Anne, and secondly because no one would be stopping self-publishing authors from 1) self-publishing, or even 2) obtaining copyrights and then self-publishing … which also is actually how things worked for the same period of history, except that now self-publishing costs less.

Also, if somebody has an unprotected work become popular, you have created a problem when they want to protect it, and maybe sell rights, like the movie rights.

There is no problem at all. The work would be entirely impossible to protect. It would in fact, be in the public domain. This is deliberate.

The purpose of copyright is to encourage authors to create and publish works that they otherwise would not have created and published. If an author is willing to create and publish a work without a copyright, it is the height of stupidity to grant them a copyright, and it is directly contrary to the public interest, which is in favor of works not being copyrighted when it is not necessary for them to be.

If the author of such a work wanted protection, they should have thought of that before they created and published the work, when they’d have an opportunity to register it and get a copyright.

Also, you would have eviscerated the creative commons and opensource/free software licenses.

Again, not worried about it. Those works would also be in the public domain, unless someone were concerned enough about copyright to register for one from the beginning. It would make things a lot like the BSD License, except with no requirement of credit.

Anonymous Coward says:

Re: Re: Re:7

Firstly, because it has basically worked out okay for the history of copyright since the Statute of Anne

Pre-Internet the only way to publish was to use a publisher, who dealt with the registration of the copyright, which by the way was automatically granted to the Author. Indeed giving authors copyrights which they could transfer to publisher for a consideration of a royalty was how copyright worked. Also note, that before publication, the Author had control over who could see the work, and back in those days make a copy by the laborious method of hand writing a new copy. Indeed the purpose of copyright protection was to give the means to stop another printer printing a release, either by getting a copy via industrial espionage, or copying a published book if it was a big hit and a second printing was required. (Producing a second printing involved all the same work as producing the first, as the type used was recycled after printing).

Also, if only 10% of works being self published were being registered, that is a million registration a day to be dealt with, and any delays would impact publishing on a schedule, or protecting analysis of current events.

Also, unless a registration system could deal with the real volume of new works, and in a world where the schedule from completion to publication ca be measured in minutes, as opposed to the months of the printed book, record, film etc. world, you are creating a system that has two classes, those who have the time and money to register their works, and those who don’t.

cpt kangarooski says:

Re: Re: Re:8

So what’s changed? Copying is still exactly as laborious for a pirate as it is for a publisher, or more so due to the latter’s ability to operate openly. The only leveling effect has been that publishers for some time now have refused to take advantage of modern technology.

(Producing a second printing involved all the same work as producing the first, as the type used was recycled after printing)

Depends. If the work was expected to be a big hit, it would be stereotyped or later, plates would be made and kept. If it wasn’t expected it would need to be reset, but sooner or later the publisher would take those steps to avoid having to reset it any more than necessary.

Also, if only 10% of works being self published were being registered, that is a million registration a day to be dealt with, and any delays would impact publishing on a schedule, or protecting analysis of current events.

First, as is typical in copyrights, patents, etc., you don’t have to wait for the registration to issue. If you’ve filed your paperwork, you’ve got your priority date, and you go ahead. The registration agency will catch up eventually.

Second, you’re damn optimistic. If it costs even a little money and takes a bit of effort, a lot of people just aren’t going to bother. I doubt you’d see anything like 10%. Some guy who goes on YouTube to talk about his political opinions, or to show off some piece of old technology he found, or just to show off his funny cat video, is not going to bother to take any affirmative steps to register a copyright. If he doesn’t care, why should anyone else?

It’s a self-sorting mechanism to determine who was actually incentivized by copyright, and therefore should get one, and who was not, and therefore should not get one. If you have a better way to figure out how to only grant copyrights where it was necessary for incentivizing an author to create and publish a work, I’d like to hear what it is.

Also, unless a registration system could deal with the real volume of new works, and in a world where the schedule from completion to publication ca be measured in minutes, as opposed to the months of the printed book, record, film etc. world, you are creating a system that has two classes, those who have the time and money to register their works, and those who don’t.

Doesn’t strike me as being that difficult, especially since, again, all that is necessary to be done quickly is to file and deposit. Copyrights aren’t examined like patents or trademarks (and therefore registrations should not be treated as having any weight as to copyrightability) and it won’t be hard to dump it into the database and issue a registration number within a reasonable (but far from instant) time.

Anonymous Coward says:

Re: Re: Re:9

You are destroying the creative commons license, and the GPL and similar licenses, as there would be no protection of the work unless it was registered. Further, the creative commons licenses allow a creator to decide how others can use their works. Also, how often would software need re-register, especially when development is carries out in public in a GIT repository.

Also, how do you prevent someone registering someone else’s unregistered work, and turning copyright against the original creator so that they can successfully monetize the work. Also, do you expect the register to include a copy of the work, because without that how is it useful to establish priority when ownership is disputed.

The existing copyright system id not fit for purpose in the Internet age, but what you are proposing is even worse because without registration creators would have no protection unless the register their works.

cpt kangarooski says:

Re: Re: Re:10

You are destroying the creative commons license, and the GPL and similar licenses, as there would be no protection of the work unless it was registered.

As noted, I don’t think it’s that dire, and also it wouldn’t bother me if it were. The GPL, Creative Commons, etc. are attempts to make the best out of what is already a bad situation with copyrights being automatically granted upon creation. If most works were in the public domain, they would not be needed.

If it is important to an author to use such a license, they would merely need to register, just like anyone else. Presumably we’d see GPL4, which would require contributors to register their contributions so that the license continued to work basically as normal.

(Although I question what happens if a contribution is in the public domain, which is a scenario that can happen now. For example, suppose a GPL-ed piece of software is modified by a federal employee in the course of their duties, which means that it is uncopyrightable per 17 USC 105. Does the GPL permit the modified, partially GPLed, partially public domain work to be distributed, or would it prohibit the distribution of whatever fell under the GPL? The answer may be instructive as to the proposed reform.)

Whereas if registering contributions was too much of a hassle, I think that it would result in contributors looking for more permissive licensing, or — or — taking advantage of the greater quantity of public domain works for which there would be no hassle whatsoever. Practically a problem that solves itself!

Also, how do you prevent someone registering someone else’s unregistered work, and turning copyright against the original creator so that they can successfully monetize the work.

Same way we do that now.

As I mentioned earlier, I’m not unsympathetic to the concern over manuscript piracy, and obviously copyrights must initially vest in the author.

I would suggest that there is a weak, and short-lived copyright granted upon creation which is only useful for the purpose of providing authors with a remedy in the event that someone publishes (inclusive of public performance or display) their work without authorization. This gives the author time to shop the work around. But if the author publishes without registration, the copyright terminates. The protections should be geared to go after the specific culprits, but not members of the general public who happen to infringe; if authors want strong protection, they should register. And it should be short-lived so that works don’t molder on the shelf forever. The goal of copyright is to get works created and published that otherwise would not be, and to protect them as minimally and briefly as possible. If a work takes more than, say, 5 or 10 years to get published — a specific time period will need to be determined — then that’s long enough. There is a point in time when it is better that the work should be pirated than never known to the public at all.

Note also that an author can register and not publish, but because they’d have to deposit, the public still gets the work in the end. If one were worried about not being able to publish quickly, that would be the way to go.

Also, do you expect the register to include a copy of the work, because without that how is it useful to establish priority when ownership is disputed.

Deposit is a traditional copyright requirement, and very useful for many purposes including establishing priority. It should be made quite strong. Indeed, for software, I’ve suggested in the past that it should require deposit of source with sufficient comments that a person having reasonable skill in the art could understand and usefully modify the work. And that for an author (or someone acting under their aegis, like an authorized publisher) to apply DRM to a work should immediately terminate the copyright. (Publication contracts should not permit authors to waive damages from publishers who do this, so that they have some recourse) Further, the Copyright Office and Library of Congress should sponsor efforts to circumvent DRM, since it leaves the public better off.

The existing copyright system id not fit for purpose in the Internet age, but what you are proposing is even worse because without registration creators would have no protection unless the register their works.

Not what I’ve said, and if you look above you’ll see that, but the copyright system should strongly urge authors to register their works as soon as possible, and providing little to nothing for authors who fail to is part of that.

It worked great for centuries and there’s no reason it cannot continue to work well. Remember, the attacks on formalities began long before the Internet was dreamed up, much less before it became widely used.

Anonymous Coward says:

Re: Re: Re:11

Presumably we’d see GPL4, which would require contributors to register their contributions so that the license continued to work basically as normal.

What do the register, the project, or every little update made public?

Deposit is a traditional copyright requirement, and very useful for many purposes including establishing priority.

Not what I’ve said, and if you look above you’ll see that, but the copyright system should strongly urge authors to register their works as soon as possible,

That worked when registration was only a requirement to protect the copyright of works about to be published, as the risk to other works was slight, requiring both the stealing or copying of a manuscript, and finding a publisher for the stolen work. (Note before the mid 70’s, copying involved writing out, typing up, or photocopying from a paper copy).

Registration would now need a system at the scale of Google to handle and store deposits in a usable form, and would be objected to by the traditional publisher, as any security failures of the system would allow pirates to steal their works.

I don’t think you grasp the scale of the problem, there are now more works being published in a minute or two than used to be published in a year. Also don’t forget that copyright applies to unpublished works as well, and is now important as gaining a copy of an unpublished work is now just one security breach away.

bhull242 (profile) says:

Re: Re: Re:12

Also don’t forget that copyright applies to unpublished works as well, and is now important as gaining a copy of an unpublished work is now just one security breach away.

That’s still the case now. Or do you think that unpublished works don’t require a copy to be stored somewhere that could be hacked?

Most of the problems you cite basically fall into one of these groups:

  1. Entirely speculative.
  2. Acceptable to most people
  3. Grossly exaggerated
  4. No different between the current system and the proposal.

Here’s the thing: charging a fee and requiring registration should keep things to manageable levels. If it’s as high as you say, then I have no problem with the government having to use such a system. Google could do it, so it’s not impossible for the government.

Anonymous Coward says:

Re: Re: Re:2

And therein lies the issue. Content creators and rightsholders would absolutely not agree to a system where they have to submit a registration for everything they create, or everything they roughly sketch/draft and doesn’t make the light of day.

Yet they have no issues with people lining up around the city block to ask them for permission because one drum riff or one chord progression may actually be infringing if you squint your eyes and perk your ears on a blue moon. It’s entirely impractical, and they know this, but they demand it because they’re not the ones being held responsible when something inevitably fucks up. Hell, even major copyright holders can’t stop themselves from trying to DMCA their own websites off the Internet.

The entire system is basically run by a bunch of Tero Pulkinnen-level simpletons. It’s a system that is impossible to implement fairly and judiciously, but they demand it from everyone when they can’t even keep their own house in order. It’s hypocritical.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re: Your pleas re "authors being ripped off" are unavailing.

The primary objective of copyright is not to reward labor of authors, but to promote the Progress of Science and useful Arts.”

— Sandra Day O’Connor, Feist Publications

If you wish it otherwise, the constitutional amendment process is on your left.

Ninjasaid (profile) says:

Re: Re: reply to Nihiltres comment

“Opponents of generative AI are already arguing for that. In the recently amended complaint of Andersen et al. v. Stability AI et al. the plaintiffs argue that their artistic styles represent “trade dress” and are therefore protectable as informal trademarks or something roughly to that effect.”

https://www.comicmix.com/wp-content/uploads/2017/12/51-Order-on-Second-MTD.pdf

didn’t a judge rule that styles cannot be trademarked?

That One Guy (profile) says:

Any time I hear about how AI needs to ‘pay’ for the content that it’s learning from one of my first thoughts regarding the authors pushing/supporting that argument is ‘Great, now about how much did you pay all the authors you learned from in order to write your stuff?’

If learning is infringement that needs paying for then not only is culture screwed then there’s a lot of currently hypocritical authors that need to start signing a lot of checks to put their money where their mouth is.

This comment has been deemed insightful by the community.
K Smith (profile) says:

There is a term ...

There is a term that describes the people who think that further empowering copyright holders will somehow magically benefit creators: USEFUL IDIOTS.

As Mike wrote, the past attempts to further empower copyright holders have almost entirely benefited the big companies, with very little benefit going to the creators. There is no reason to doubt that any future attempts will do the same.

-skh

Anonymous Coward says:

Re:

There is a term that describes the people who think that further empowering copyright holders will somehow magically benefit creators: USEFUL IDIOTS.

Oh, there’s another group of people who believe the above. People who want to chip away at basic protections like “innocence before proven guilty”, and standards of evidence that have to be brought in front of a judge before they can grant an all-encompassing subpoena for information.

There’s been one guy flitting around Techdirt who’s claimed that Section 230 must die to make sure that celebrities can sue everyone and anyone who might have besmirched them in an Internet comment. To him, Section 230 is an obstacle to his goals of unfettered mass litigation.

This comment has been deemed insightful by the community.
That One Guy (profile) says:

Re: Mind blown

As Mike wrote, the past attempts to further empower copyright holders have almost entirely benefited the big companies, with very little benefit going to the creators. There is no reason to doubt that any future attempts will do the same.

Wait wait wait, do you mean to suggest that applying the logic behind trickle-down economics to copyright might be a bad idea?

Anonymous Coward says:

Except...

I think they know this, but they want to also aim at big tech.

They know smaller companies will get screwed, but that means they can get the bigger companies easier.

Even though they know big tech will survive, as Leif K-Brooks says:
while some of them are much larger companies with much greater resources, they all have their breaking point somewhere. I worry that, unless the tide turns soon, the Internet I fell in love with may cease to exist, and in its place, we will have something closer to a souped-up version of TV – focused largely on passive consumption, with much less opportunity for active participation and genuine human connection.

Politicians and people who want to stick it to big tech know this (at least I would assume), so don’t get surprised about this.

Anonymous Coward says:

Re:

I don’t think that would solve all the issues – what would be the proof that all materials used came from the library? Neither would it convince copyright holders that existing models were all trained with library material, even if they were legally owned.

And that’s not even going into what copyright holders think of libraries. There’s a non-zero number of them who would absolutely love to go after free public access to books.

@nougatmachine@mastodon.social says:

The comparison to Spotify and music labels here is asinine, because to work at all, generative AI models need more than the equivalent of musicians who’ve signed deals with publishers.

Generative AI’s strength is derived from the totality of its training data scraped across the entire internet, from both extremes of both axes of the graph: profitable to amateur, high-quality to low-quality.

Famous, successful creatives may be some of the loudest voices protesting generative works, but ChatGPT’s strength does not come from slurping up Stephen King. Its strength comes from slurping up every blog comment, every Amazon review, every SEO clickbait piece of absolute garbage, every brilliant Substack newsletter tragically under-read.

It’s conceivable that the most-valuable, most high-profile content in the world could get thrown together into some sort of record-label-esque gatekeeper consortium that requires exorbitant rates to access. But that’s not the point. Perhaps I’m ascribing too much of my own thoughts to other people’s motivations, but to me the beauty of a regime that requires copyright permission for training generative AI is the sheer impossibility of tracking down every author of unremarkable, low-quality, amateur content effectively makes any commercial application of generative AI impossible. This is a feature, not a bug: you can’t generate low-quality thoughts created by putting a bunch of words in a blender and hitting puree if it’s impossible to figure out where to buy the ingredients for the smoothie.

From the perspective of someone like me, who views generative AI as a vehicle for enshittification and making the signal-to-noise ratio of information on the web even worse than it already is, nothing could be more delicious.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...