If Creators Suing AI Companies Over Copyright Win, It Will Further Entrench Big Tech
from the be-careful-what-you-wish-for dept
There’s been this weird idea lately, even among people who used to recognize that copyright only empowers the largest gatekeepers, that in the AI world we have to magically flip the script on copyright and use it as a tool to get AI companies to pay for the material they train on. But, as we’ve explained repeatedly, this would be a huge mistake. Even if people are concerned about how AI works, copyright is not the right tool to use here, and the risk of it being used to destroy all sorts of important and useful tools is quite high (ignoring Elon Musk’s prediction that “Digital God” will obsolete all of this).
However, because so many people think that they’re supporting creators and “sticking it” to Big Tech in supporting these copyright lawsuits over AI, I thought it might be useful to play out how this would work in practice. And, spoiler alert, the end result would be a disaster for creators, and a huge benefit to big tech. It’s exactly what we should be fighting against.
And, we know this because we have decades of copyright law and the internet to observe. Copyright law, by its very nature as a monopoly right, has always served the interests of gatekeepers over artists. This is why the most aggressive enforcers of copyright are the very middlemen with long histories of screwing over the actual creatives: the record labels, the TV and movie studios, the book publishers, etc.
This is because the nature of copyright law is such that it is most powerful when a few large entities act as central repositories for the copyrights and can lord around their power and try to force other entities to pay up. This is how the music industry has worked for years, and you can see what’s happened. After years of fighting internet music, it finally devolved into a situation where there are a tiny number of online music services (Spotify, Apple, YouTube, etc.) who cut massive deals with the giant gatekeepers on the other side (the record labels, the performance rights orgs, the collection societies) while the actual creators get pennies.
This is why we’ve said that AI training will never fit neatly into a licensing regime. The almost certain outcome (because it’s what happens every other time a similar situation arises) is that there will be one (possibly two) giant entities who will be designated as the “collection society” with whom AI companies will have to negotiate or to just purchase a “training license” and that entity will then collect a ton of money, much of which will go towards “administration,” and actual artists will… get a tiny bit.
And, because of the nature of training data, which only needs to be collected once, it’s not likely that this will be a recurring payment, but a minuscule one-off for the right to train on the data.
But, given the enormity of the amount of content, and the structure of this kind of thing, the cost will be extremely high for the AI companies (a few pennies for every creator online can add up in aggregate), meaning that only the biggest of big tech will be able to afford it.
In other words, the end result of a win in this kind of litigation (or, if Congress decides to act to achieve something similar) would be the further locking-in of the biggest companies. Google, Meta, and OpenAI (with Microsoft’s money) can afford the license, and will toss off a tiny one-time payment to creators (while whatever collection society there is takes a big cut for administration).
And then all of the actually interesting smaller companies and open source models are screwed.
End result? More lock-in of the biggest of big tech in exchange for… a few pennies for creators?
That’s not a beneficial outcome. It’s a horrible outcome. It will not just limit innovation, but it will massively limit competition and provide an even bigger benefit to the biggest incumbents.
Filed Under: ai, big tech, copyright, licensing, monopolies
Comments on “If Creators Suing AI Companies Over Copyright Win, It Will Further Entrench Big Tech”
Is the alternative (re: not addressing copyright law with regards to AI) actually any better?
We’ve already seen companies limit API access, ostensibly because of cost, but simultaneously raising rates as the value of API access becomes more associated with AI outcomes.
We already see the largest, most influential tech companies able to far outstrip other competitors with AI, as they’re better able to parse the massive amount of data available.
Copyright currently exists to help the middlemen, not the artists, but is that an argument against applying copyright to AI, or an indictment of our copyright system in general?
Arguing that artists are already required to bend over backwards for major companies, thus copyright shouldn’t be enforced with regards to AI sounds like a bit of a backwards argument.
Re:
Yes. Absolutely.
A practice I’ve criticized, and one I don’t think is all that sustainable. I mean, many of the AI systems don’t use the APIs anyway, but resort to scraping or other content access mechanisms anyway. So while the API pricing is notable, I’m not sure it’s difference making.
I… also don’t think that’s accurate. I mean, go back just a few years ago and we kept hearing how the only companies that would be able to parse the data and offer AI were Google, FB, Amazon, and Apple.
And yet, the leaders in AI are… not really any of those guys. They’re players, certainly, but OpenAI leads, and other companies are doing well as well, such as Anthropic.
It’s true that many of these companies have now taken billions of dollars from the big tech companies, but that’s because the big tech companies haven’t actually been able to get the same results.
It is an indictment of copyright, definitely. But it’s an indictment of what is inherent in the copyright system. And will remain, even here.
But I’m not arguing either that “artists should bend over backwards” nor am I arguing that “copyright shouldn’t be enforced.” I’m saying there’s NO COPYRIGHT TO ENFORCE in training, because it’s fair use.
And, in the long run that will HELP artists, and not leave them as beholden to big companies, because there will be much more competition.
Re: Re:
Not the person you originally replied to, but
What exactly does this mean? What form of “competition” are you talking about?
Re: Re: Re:
Generative AI is a tool. and needs human guidance to produce meaningful output. Licensing would make it so that only corporations could afford the cost, and bureaucracy to deal with licensing, giving them another way to trap and exploit artists. If any body can train a mode, or use a model without licensing, then artists are able to avail themselves of the tool without signing contracts which take away some of their control over their own works.
Re: Re:
This feels like motivated reasoning. You want it to happen for fair use reasons, but this angle is pretty unfleshed out. But it solves what would otherwise be thorny tradeoffs.
You’re right that the current system only gives them pennies, but I can see why artists wouldn’t be thrilled to give up the pennies they’re getting now, given the alternative
Re: Re: Re:
Here’s a little exercise for you, that may take time to do. Look at the number of artists and creators publishing works on the Internet, and find out what percentage make any money from their art. I think that you will find that making money from art happens because a fair number of people, in the tens of thousands, like the work, and a small percentage of those decide to support the artists and creators. Copyright doesn’t come into it, as those supporting the creators do so because they want more new works, and supporting the artists helps that happen.
As ever the lot of most of them is very little of no money because they do not attract a big enough audience. Also note that pre-Internet, most works submitted to publishers sat in a pile and were never even looked at, and of those the looked at, only a few were chosen for publication, so as a result most creators made nothing from their work.
Re: Re: Re:2
I think you don’t actually understand at all how artists make their living.The way artists make their living is by being employed by companies who need them.
Generative AI renderes artists obsolete and about 90 precent of the artists currently employed by advertisment agencies, animation studios, game studios and many other industries will be unemployed.
Most artists don’t want to be unemployed and would prefer to keep doing what they currently do for a living, and not have to compete against AI that is trained on their work without their consent.
Re: Re: Re:3
Most people doesn’t want to be unemployed and would prefer to have a job to make a living, even if it is something they don’t like doing. Imagine if everyone could do what they liked and getting paid for it.
In reality, almost all artists are unemployed as artists, some make some money on the internet and a very few, relatively speaking, has it as a job.
Re: Re: Re:3
I am sure all those creators that have worked hard to build an audience while publishing via YouTube, just to give one self publishing platform, far outnumber those who are employed by the corporations. Its like at one time landscape and portrait painters could find jobs with various magazine, or get contracts to paint landscapes and portraits and then the camera came along and within a few years they were out of business.
Those relying on corporations to make a living may have to find a new job, while those who have built an audience via self publishing have a new tool to help them create, where it is useful. It is not as if generative AI will stop creators from making a living, but it may eliminate the need for those with technical skills who work to implement somebody else’s vision.
Re: Re: Re:3
As it stands, the random content generators aren’t going to put anyone out of business… yet.
But some, ahem, unemployment will happen as people are temporarilly shuffled around their employement status due to random content generators being able to scoop up a fraction of those jobs.
As it is, the biggest reason for the layoffs is… the economy. Oh, and Elon firing half of Twitter’s workforce, including software engineers…
Your fearmongering about the random content generators notwithstanding, that is.
Why is Allen Iverson getting all the attention nowadays?!?
What has he done, of significance, lately?
Anyone who thinks expanding copyright will help individual creators rather than corporate publishers hasn’t been paying attention the last…every single time we’ve ever tried that.
Say what?
“Let companies rip off your work, or else only Big Tech will be able to rip off your work” is not a compelling argument.
Re:
Well, your first (and largest) problem is assuming “fair use” is the equivalent of “ripping off your work.”
Fix that and the rest of your confusion might melt away.
Re: Re:
It’s absolutely not Fair Use. If a music producer wants to use a drum beat from another song and put it into their own music they have to ask the original artist for permission, get sampling clearances and pay for that. Just because you changed the tool you use to do that from Logic Pro to OpenAI doesn’t magically make that process legally any different.
No one’s asking for the expansion of copyright. Artists are asking that AI manufacturers and users are held to the same legal standards they already have to follow.
Re: Re: Re:
If they want to use a sample is not the same as using a similar style, In your world thing lake jazz, rage-time, rock etc. could not exist because they all learn from the early adopters of a style.
Re: Re: Re:
How many producers have paid for their usage of the Four Chords?
Hell, if the lawsuits that Ed Sheeran and Katy Perry faced are anything to go by, not even following the rules will get you out of trouble. All you need is someone with an ax to grind and the resources to drag out a lengthy lawsuit.
Artists are asking for people to pay for the mimicry of their art style, which is not under copyright.
Sarah Andersen is attempting to use copyright law to stop incels from AI-generating misogynistic content. The motivation is understandable, the law is not. Copyright law would not have protected her work from this usage. On the other hand, if a large corporation generates content similar to hers – by AI or otherwise – and sues her for copyright infringement, that would be the result of the precedent she sets if she wins a copyright lawsuit based on style and usage she disagrees with.
Re: Re: Re:
That is because Judge Duffy, frankly, was biased against rap music, and was a jerk.
If a visual artist wants to use an image from another work, and clips it out and pastes it into a collage, that’s fine art. There is no reason whatsoever to treat audio sampling any differently.
Re: Re: Re:
It absolutely is. Training is the equivalent of reading. Reading is not infringement.
Yes. That’s true. But this is not sampling. They are not using the material in another work. They are training on it and producing something different.
These cases seek to expand copyright and, in particular, overturn the Google Books and Hathitrust cases.
Re: Re: Re:2
That is just fundamentally wrong, and why your stance makes no sense. It’s not “just reading”.
This is fucking ridiculous levels of justifying ripping off someone else’s work. These systems aren’t “producing something different”, they’re making slight tweaks to existing work and passing it off as new. Taking two things and cut & pasting them together still means you’re using those two works. You can argue if it’s transformative or not, but you can’t justify “[t]they are not using the material in another work.”
Re: Re: Re:3
Do you play a musical instrument, take photographs, make videos, tell stories? If you do are you not ripping off the works that used used to learn to do those things?
Re: Re: Re:3
So tell us, what is it doing?
Which is something that most artists also do, or do you actually believe that there are wholly 100% original art being produced by artists?
Re: Re: Re:3
Is learning 1+1=2 a copyright infringement? Is learning how to fucking read and write infringement?
Your argument, as it stands, will make education a form of copyright infriingement. And will legitimately limit education to the ultrarich…
And that’s how humans make new discoveries.
The Carolingan Revolution was the culmination of these “making slight tweaks to existing works” and compiled into a codified standard. And the wheel. And fire, and the Printing Press….
And pretty much all the music, all the writing, and all the math.
I don’t like the NFT scammers promoting random content generators as a form of artificial intelligence (they are not), and these, ahem, random content generators can do good work in strictly and narrowly defined conditions…
But crowing about training these systems being no different than infringement?
That’s a copyright maximalist argument, and congratulations, YOU JUST REPEATED THE FUCKING THING.
Re: Re: Re:3
I’m afraid that you must have been misled about how the technology works. It does not “cut and paste things together”. The process for text-to-image latent diffusion starts with an image of coloured static, pseudorandomly generated from the “seed” value, an integer value. As the diffusion process steps forward, it mutates the image to reflect the keywords in its prompt as though it were “seeing shapes in static”, slowly sharpening the image. Check out this example to get a visual idea.
Once you see it pick up on features of a prompt, you can understand that it’s conditioned on words, which we all know are a smaller space than images (perhaps by the colloquial thousand?) and so share vectors with other examples of the same word. The prompt “oil painting” will evoke not only the common understanding of the phrase but many other senses of “oil” and “painting”. Diffusion users are recombining groups of features common to the keywords they use. That sort of “averaging” effect puts the elements used down in complexity, so we have to ask: are the used elements even copyrightable at their level of atomicity? It would clearly be ethical and legal for a human to look at a thousand images of a “dingleblorp” (an object no human has ever seen) and then draw a dingleblorp from memory, averaging a bit across the many they saw, so why should it be unethical or illegal for a machine to do the same? If a human can do it for a dingleblorp, then why can’t a machine, which has never seen a fire hydrant, do it with a million images of a fire hydrant?
I agree that it gets more complicated when the keywords are proper nouns, and are much more likely to reproduce something specific if used in a prompt. Prompting an artist’s name to evoke the style and techniques they use, or prompting a work title specifically, are techniques that are ethically dubious on their face, inviting cheap copying and ripoff. But the options that they represent ought nonetheless to exist: we shouldn’t exclude the sorts of uses that would be fair use or de minimis. We should have Mickey Mouse in the dataset, because someone’s going to make fair-use parody of Mickey Mouse. Techniques ought to be allowed that would evoke an artist or even a specific work but that are fair-use in context. It’s what people do with it that matters.
You may have been misled by examples where someone used diffusion software to do something that they shouldn’t have. There’s an important difference between Duchamp’s L.H.O.O.Q. and prompting “Mona Lisa Leonardo da Vinci”, and in practice some people will use diffusion in image-to-image mode, mutating an existing image just a bit or reinterpreting a provided image, in ways that are cheap ripoffs of the original. There are valid ways to use image-to-image diffusion too, but it’s user conduct that makes the difference, just as a user of a pencil can trivially infringe on copyright.
Re: Re: Re:3
It literally is just reading. In the process it indexes, and indexing is already considered fair use.
So, the only one “fundamentally wrong” is you.
Again, fair use is not “ripping off someone else’s work.” Stop saying it is.
That is not even remotely how these systems work.
I’m sorry, but you are ridiculously uninformed.
And if that were happening, you’d have a point.
But it’s not.
So you don’t.
Re: Re: Re:3
Yes, yes it is. That’s literally what the AI does. It reads and interprets that data, but doesn’t copy it.
Nope. They’re producing something different that has a similar style. Styles aren’t copyrightable.
AIs don’t use cutting or pasting at all. Again, you clearly don’t know how AI works. The AI retains no copies of anything in the training data at all.
Yes, yes we can. You still haven’t even demonstrated copying is even occurring. Again, the AI retains none of its training material at all.
Re: Re:
Which part of ChatGPT’s systems involves “fair use”? Explain that, and your own stance might need adjusting.
Re: Re: Re:
So you don’t understand the word “reading” then?
Re: Re: Re:
I mean, this isn’t difficult:
Multiple cases have found that indexing copyright-covered work to create a new service is fair use: search has been deemed to be fair use, including the archived versions of the indexes. Book scanning for the purpose of search has been found to be fair use.
What’s happening with AI training is even less of a copyright issue than those other examples. In both of those other examples you could use the tools to see some of the copyright-covered content.
I am sorry. You do not seem to understand (1) how AI works (2) how copyright works or (3) how fair use works.
Fix that and you might start understanding things.
Re:
Somewhere along the line, copyright has expanded beyond the control of producing copies to controlling how people can use the contents of a work. Any generic form of usage licensing, such as performance licenses that venues are required to get to allow musical performance become a means of supporting another layer of parasites, and transferring money from the the poorer artists to the richer artists.
While get a license seems such a simple solution, it is hugely impractical, especially as self publishing exists. Just who keeps track of who owns what copyrights when tens or hundreds of thousands of works of new works are published every day. Any licensing schemes similar to the collection societies means that a new layer of parasites are created, and the publisher take their cut of the license fees where a few crumbs may make it to the richest creators that they have signed on. obscure creators, and self publishers will likely find their works covered by any licensing fee, but they will not see a penny of that income.
Re: Re:
“Getting a license is easy. If you won’t do that, you don’t deserve to use my work.”
“Okay, where’s the registration for the copyright you hold? If you don’t actually hold a registered copyright, on what grounds can you claim the copyright belongs to you?”
“Wait, not like that.”
Re: Re: Re:
I know you are joking, but with a population of 333,287,557 (2022 census), and allowing photos and videos, along with created text and visual arts pieces, just how big would the copyright office have to be to handle registration of works being created in the US. A system designed to meet the needs of the legacy publishers, where it was easy to list the works published in a year in a couple of book sized catalogues, is not going to work in a world with easy self publishing and the flood of works that pre-Internet would never have been seen by more than a few family members.
Re: Re: Re:2
Not that big. (Also, US censuses are decennial, in years evenly divisible by 10)
In 1970, the census determined the US population to be 203,302,031. Copyrights operated under the 1909 Copyright Act still, and so registration was essential to holding a copyright in a published work. Things seemed to work okay, and they didn’t even have the advantage of all of the modern computer equipment we have now.
According to the Copyright Office report from 1970, they had 316,465 registrations, only 23,549 renewal registrations in 1969, but I don’t think that they’ve ever had more than a few hundred employees.
Bringing back formalities, or even strengthening them to new levels, as I strongly believe should be done, as well as shortening terms but increasing renewals (ditto), would require more work by the Copyright Office, but a lot more of it could be automated than before. Renewals shouldn’t require human involvement at all on the administrative side, for example. (Much like registering and renewing domain names has gone from something done by two or three people at InterNIC to a big but thoroughly computerized industry, so long as there are no complaints or disputes requiring human interaction)
Remember, the issue isn’t the number of works being created; it’s how many of those works’ creators think that it’s worth the time, trouble, and money to bother to register. I bet you didn’t register your post that I’m replying to. I know I’m not going to bother to register this one.
If the author doesn’t care enough about getting a copyright to fill out a form, submit a couple-three best copies, and pay a nominal fee (just to avoid people spamming the system — a dollar would be enough for me), then why should anyone else care about granting them a copyright?
This self-selection works wonders to keep the number of registrations, and thus copyrights, to a manageable level — we just need to make copyrights contingent on registrations.
Re: Re: Re:3
You are overlooking a big difference between the world pre-internet and post-Internet. Pre-Internet the works being registered for copyright purposes were those few works selected by the publishers, labels and studios. Those registered in any year could be catalogued is a few book size volumes, which for books at least was done. Post Internet, anybody can publish a work, and on a world wide scale, YouTube gas about 5oo hours worth of videos published every minute, say 400 works a minute. Instagram has about 64,000 photos published every minute. Most of those works are new works, whose copyright belongs to the creator, and that is only two places where new works are published.
Add in article length blog posts, music and books published in various place around the Internet and you have a problem that is bigger than any practical copyright registration system. Just the two example I have given are approaching 10 million new works being published a day. That figure should also tell you how little of human creativity was being published when publication had to go via a gate keeper.
Re: Re: Re:4
I don’t think so.
Remember, I’m not saying that every published work should be registered. I’m saying that aside from a minor copyright on unpublished works (to avoid pre-publication piracy), copyrights should not be granted without registration. Only the copyright claimant should be allowed to decide whether to register, and to actually go through the registration process.
And the process should not be free; it should involve some effort by the registrant, and it should involve some monetary cost.
Most authors or other copyright claimants won’t bother — indicating that they were willing to create and publish a work without copyright acting as an incentive, and that therefore they are undeserving of copyrights. (Which should be reserved for situations where they are necessary for works to be created and published)
Further, require registration prior to publication — except for works created contemporaneously with publication, such as live performances — and expand the concept of what constitutes publication to include any public availability of the work, rather than of a copy, making performances and displays count too.
Anyone will be able to make, register, and then upload to YouTube their react video or unboxing (or whatever is trendy now; I don’t keep up with what the kids think is cool), or upload to their blog, or whatever. But I bet that by and large they will not bother to.
Which is fine — it was their choice, and now those works are immediately in the public domain. I doubt it will have much effect; who would want to pirate this stuff? And if there is a lot of piracy, well, I’m not willing to protect authors from making mistakes or bad deals any more than we protect anyone else from such things.
Maybe there’s 10 million new works every day, but I bet that if we strictly required formalities, there’d be only a few thousand registrations a day. Because when copyrights aren’t free, and aren’t automatic, and claimants need to think about it, and put in some effort and money to register, they think a lot harder about whether they really need a copyright or not.
And if a lot of works are registered, and we need to increase the staff of the Copyright Office, well, that’s why it’s part of the government. I don’t mind it being taxpayer supported. (The registration fee isn’t meant to support it — it’s meant to impose a minor but tangible hurdle on the claimant so that they don’t spam registrations)
Re: Re: Re:5
congratulations, you have proposed a two class copyright system, where copyrights largely exists for works that go through a gatekeeper publisher, and leave everything else unprotected. Also, if somebody has an unprotected work become popular, you have created a problem when they want to protect it, and maybe sell rights, like the movie rights.Also, you would have eviscerated the creative commons and opensource/free software licenses.
Re: Re: Re:6
I am okay with that. Firstly, because it has basically worked out okay for the history of copyright since the Statute of Anne, and secondly because no one would be stopping self-publishing authors from 1) self-publishing, or even 2) obtaining copyrights and then self-publishing … which also is actually how things worked for the same period of history, except that now self-publishing costs less.
There is no problem at all. The work would be entirely impossible to protect. It would in fact, be in the public domain. This is deliberate.
The purpose of copyright is to encourage authors to create and publish works that they otherwise would not have created and published. If an author is willing to create and publish a work without a copyright, it is the height of stupidity to grant them a copyright, and it is directly contrary to the public interest, which is in favor of works not being copyrighted when it is not necessary for them to be.
If the author of such a work wanted protection, they should have thought of that before they created and published the work, when they’d have an opportunity to register it and get a copyright.
Again, not worried about it. Those works would also be in the public domain, unless someone were concerned enough about copyright to register for one from the beginning. It would make things a lot like the BSD License, except with no requirement of credit.
Re: Re: Re:7
Pre-Internet the only way to publish was to use a publisher, who dealt with the registration of the copyright, which by the way was automatically granted to the Author. Indeed giving authors copyrights which they could transfer to publisher for a consideration of a royalty was how copyright worked. Also note, that before publication, the Author had control over who could see the work, and back in those days make a copy by the laborious method of hand writing a new copy. Indeed the purpose of copyright protection was to give the means to stop another printer printing a release, either by getting a copy via industrial espionage, or copying a published book if it was a big hit and a second printing was required. (Producing a second printing involved all the same work as producing the first, as the type used was recycled after printing).
Also, if only 10% of works being self published were being registered, that is a million registration a day to be dealt with, and any delays would impact publishing on a schedule, or protecting analysis of current events.
Also, unless a registration system could deal with the real volume of new works, and in a world where the schedule from completion to publication ca be measured in minutes, as opposed to the months of the printed book, record, film etc. world, you are creating a system that has two classes, those who have the time and money to register their works, and those who don’t.
Re: Re: Re:8
So what’s changed? Copying is still exactly as laborious for a pirate as it is for a publisher, or more so due to the latter’s ability to operate openly. The only leveling effect has been that publishers for some time now have refused to take advantage of modern technology.
Depends. If the work was expected to be a big hit, it would be stereotyped or later, plates would be made and kept. If it wasn’t expected it would need to be reset, but sooner or later the publisher would take those steps to avoid having to reset it any more than necessary.
First, as is typical in copyrights, patents, etc., you don’t have to wait for the registration to issue. If you’ve filed your paperwork, you’ve got your priority date, and you go ahead. The registration agency will catch up eventually.
Second, you’re damn optimistic. If it costs even a little money and takes a bit of effort, a lot of people just aren’t going to bother. I doubt you’d see anything like 10%. Some guy who goes on YouTube to talk about his political opinions, or to show off some piece of old technology he found, or just to show off his funny cat video, is not going to bother to take any affirmative steps to register a copyright. If he doesn’t care, why should anyone else?
It’s a self-sorting mechanism to determine who was actually incentivized by copyright, and therefore should get one, and who was not, and therefore should not get one. If you have a better way to figure out how to only grant copyrights where it was necessary for incentivizing an author to create and publish a work, I’d like to hear what it is.
Doesn’t strike me as being that difficult, especially since, again, all that is necessary to be done quickly is to file and deposit. Copyrights aren’t examined like patents or trademarks (and therefore registrations should not be treated as having any weight as to copyrightability) and it won’t be hard to dump it into the database and issue a registration number within a reasonable (but far from instant) time.
Re: Re: Re:9
You are destroying the creative commons license, and the GPL and similar licenses, as there would be no protection of the work unless it was registered. Further, the creative commons licenses allow a creator to decide how others can use their works. Also, how often would software need re-register, especially when development is carries out in public in a GIT repository.
Also, how do you prevent someone registering someone else’s unregistered work, and turning copyright against the original creator so that they can successfully monetize the work. Also, do you expect the register to include a copy of the work, because without that how is it useful to establish priority when ownership is disputed.
The existing copyright system id not fit for purpose in the Internet age, but what you are proposing is even worse because without registration creators would have no protection unless the register their works.
Re: Re: Re:10
As noted, I don’t think it’s that dire, and also it wouldn’t bother me if it were. The GPL, Creative Commons, etc. are attempts to make the best out of what is already a bad situation with copyrights being automatically granted upon creation. If most works were in the public domain, they would not be needed.
If it is important to an author to use such a license, they would merely need to register, just like anyone else. Presumably we’d see GPL4, which would require contributors to register their contributions so that the license continued to work basically as normal.
(Although I question what happens if a contribution is in the public domain, which is a scenario that can happen now. For example, suppose a GPL-ed piece of software is modified by a federal employee in the course of their duties, which means that it is uncopyrightable per 17 USC 105. Does the GPL permit the modified, partially GPLed, partially public domain work to be distributed, or would it prohibit the distribution of whatever fell under the GPL? The answer may be instructive as to the proposed reform.)
Whereas if registering contributions was too much of a hassle, I think that it would result in contributors looking for more permissive licensing, or — or — taking advantage of the greater quantity of public domain works for which there would be no hassle whatsoever. Practically a problem that solves itself!
Same way we do that now.
As I mentioned earlier, I’m not unsympathetic to the concern over manuscript piracy, and obviously copyrights must initially vest in the author.
I would suggest that there is a weak, and short-lived copyright granted upon creation which is only useful for the purpose of providing authors with a remedy in the event that someone publishes (inclusive of public performance or display) their work without authorization. This gives the author time to shop the work around. But if the author publishes without registration, the copyright terminates. The protections should be geared to go after the specific culprits, but not members of the general public who happen to infringe; if authors want strong protection, they should register. And it should be short-lived so that works don’t molder on the shelf forever. The goal of copyright is to get works created and published that otherwise would not be, and to protect them as minimally and briefly as possible. If a work takes more than, say, 5 or 10 years to get published — a specific time period will need to be determined — then that’s long enough. There is a point in time when it is better that the work should be pirated than never known to the public at all.
Note also that an author can register and not publish, but because they’d have to deposit, the public still gets the work in the end. If one were worried about not being able to publish quickly, that would be the way to go.
Deposit is a traditional copyright requirement, and very useful for many purposes including establishing priority. It should be made quite strong. Indeed, for software, I’ve suggested in the past that it should require deposit of source with sufficient comments that a person having reasonable skill in the art could understand and usefully modify the work. And that for an author (or someone acting under their aegis, like an authorized publisher) to apply DRM to a work should immediately terminate the copyright. (Publication contracts should not permit authors to waive damages from publishers who do this, so that they have some recourse) Further, the Copyright Office and Library of Congress should sponsor efforts to circumvent DRM, since it leaves the public better off.
Not what I’ve said, and if you look above you’ll see that, but the copyright system should strongly urge authors to register their works as soon as possible, and providing little to nothing for authors who fail to is part of that.
It worked great for centuries and there’s no reason it cannot continue to work well. Remember, the attacks on formalities began long before the Internet was dreamed up, much less before it became widely used.
Re: Re: Re:11
What do the register, the project, or every little update made public?
That worked when registration was only a requirement to protect the copyright of works about to be published, as the risk to other works was slight, requiring both the stealing or copying of a manuscript, and finding a publisher for the stolen work. (Note before the mid 70’s, copying involved writing out, typing up, or photocopying from a paper copy).
Registration would now need a system at the scale of Google to handle and store deposits in a usable form, and would be objected to by the traditional publisher, as any security failures of the system would allow pirates to steal their works.
I don’t think you grasp the scale of the problem, there are now more works being published in a minute or two than used to be published in a year. Also don’t forget that copyright applies to unpublished works as well, and is now important as gaining a copy of an unpublished work is now just one security breach away.
Re: Re: Re:12
That’s still the case now. Or do you think that unpublished works don’t require a copy to be stored somewhere that could be hacked?
Most of the problems you cite basically fall into one of these groups:
Here’s the thing: charging a fee and requiring registration should keep things to manageable levels. If it’s as high as you say, then I have no problem with the government having to use such a system. Google could do it, so it’s not impossible for the government.
Re: Re: Re:2
And therein lies the issue. Content creators and rightsholders would absolutely not agree to a system where they have to submit a registration for everything they create, or everything they roughly sketch/draft and doesn’t make the light of day.
Yet they have no issues with people lining up around the city block to ask them for permission because one drum riff or one chord progression may actually be infringing if you squint your eyes and perk your ears on a blue moon. It’s entirely impractical, and they know this, but they demand it because they’re not the ones being held responsible when something inevitably fucks up. Hell, even major copyright holders can’t stop themselves from trying to DMCA their own websites off the Internet.
The entire system is basically run by a bunch of Tero Pulkinnen-level simpletons. It’s a system that is impossible to implement fairly and judiciously, but they demand it from everyone when they can’t even keep their own house in order. It’s hypocritical.
Re: Your pleas re "authors being ripped off" are unavailing.
— Sandra Day O’Connor, Feist Publications
If you wish it otherwise, the constitutional amendment process is on your left.
I suspect it will be even worse in the end. If the AI creators have to go to the gatekeepers, what is the likely hood of them being able to get the kind of data they want to train their AI vs some kind of prepackaged low quality data? How vibrant of a marketplace will there be if all the training sets are the same?
Great article, absolutely true. And if they establish precedent against AI training, who knows how many other artistic pursuits will be foreclosed? Are we moving toward style being copyrightable? That would be a nightmare.
Re:
Opponents of generative AI are already arguing for that. In the recently amended complaint of Andersen et al. v. Stability AI et al. the plaintiffs argue that their artistic styles represent “trade dress” and are therefore protectable as informal trademarks or something roughly to that effect.
Re: Re: reply to Nihiltres comment
“Opponents of generative AI are already arguing for that. In the recently amended complaint of Andersen et al. v. Stability AI et al. the plaintiffs argue that their artistic styles represent “trade dress” and are therefore protectable as informal trademarks or something roughly to that effect.”
https://www.comicmix.com/wp-content/uploads/2017/12/51-Order-on-Second-MTD.pdf
didn’t a judge rule that styles cannot be trademarked?
Re: Re: Re: second reply to Nihiltres comment
multiple courts have ruled that style cannot be trademarked or copyrighted.
Re: Re: Re:
I’m just saying that that’s what they’re currently arguing. See the amended complaint, for example at pp. 71–75 (it may be fastest to simply search the text for instances of “trade dress”).
I’m not a lawyer, but it seems extremely obvious that it’s an attempt to rope style into IP law.
Any time I hear about how AI needs to ‘pay’ for the content that it’s learning from one of my first thoughts regarding the authors pushing/supporting that argument is ‘Great, now about how much did you pay all the authors you learned from in order to write your stuff?’
If learning is infringement that needs paying for then not only is culture screwed then there’s a lot of currently hypocritical authors that need to start signing a lot of checks to put their money where their mouth is.
Re:
I strongly suspect this licensing is being pushed by the traditional publishers, labels and studios, as more wide spread use of AI will increase the competition that they face for eyeballs and ears to consume the works they control.
There is a term ...
There is a term that describes the people who think that further empowering copyright holders will somehow magically benefit creators: USEFUL IDIOTS.
As Mike wrote, the past attempts to further empower copyright holders have almost entirely benefited the big companies, with very little benefit going to the creators. There is no reason to doubt that any future attempts will do the same.
-skh
Re:
Oh, there’s another group of people who believe the above. People who want to chip away at basic protections like “innocence before proven guilty”, and standards of evidence that have to be brought in front of a judge before they can grant an all-encompassing subpoena for information.
There’s been one guy flitting around Techdirt who’s claimed that Section 230 must die to make sure that celebrities can sue everyone and anyone who might have besmirched them in an Internet comment. To him, Section 230 is an obstacle to his goals of unfettered mass litigation.
Re: Mind blown
As Mike wrote, the past attempts to further empower copyright holders have almost entirely benefited the big companies, with very little benefit going to the creators. There is no reason to doubt that any future attempts will do the same.
Wait wait wait, do you mean to suggest that applying the logic behind trickle-down economics to copyright might be a bad idea?
Except...
I think they know this, but they want to also aim at big tech.
They know smaller companies will get screwed, but that means they can get the bigger companies easier.
Even though they know big tech will survive, as Leif K-Brooks says:
while some of them are much larger companies with much greater resources, they all have their breaking point somewhere. I worry that, unless the tide turns soon, the Internet I fell in love with may cease to exist, and in its place, we will have something closer to a souped-up version of TV – focused largely on passive consumption, with much less opportunity for active participation and genuine human connection.
Politicians and people who want to stick it to big tech know this (at least I would assume), so don’t get surprised about this.
Re:
Good luck, they already got bamboozled by Microsoft.
Guess who Sam Altman actually works for?
Question Mike, honest
Wouldn’t the solution be to use library content? Only.
Re:
I don’t think that would solve all the issues – what would be the proof that all materials used came from the library? Neither would it convince copyright holders that existing models were all trained with library material, even if they were legally owned.
And that’s not even going into what copyright holders think of libraries. There’s a non-zero number of them who would absolutely love to go after free public access to books.
Re: Re:
Though true, the courts have always stood with 1:1 access being legal.
Using a legal library account definitely creates a gigantic shield. It may not stop a lawsuit, but likely leads to the AI company prevailing.
There’s something to be said about going to war with the right equipment and all.
The comparison to Spotify and music labels here is asinine, because to work at all, generative AI models need more than the equivalent of musicians who’ve signed deals with publishers.
Generative AI’s strength is derived from the totality of its training data scraped across the entire internet, from both extremes of both axes of the graph: profitable to amateur, high-quality to low-quality.
Famous, successful creatives may be some of the loudest voices protesting generative works, but ChatGPT’s strength does not come from slurping up Stephen King. Its strength comes from slurping up every blog comment, every Amazon review, every SEO clickbait piece of absolute garbage, every brilliant Substack newsletter tragically under-read.
It’s conceivable that the most-valuable, most high-profile content in the world could get thrown together into some sort of record-label-esque gatekeeper consortium that requires exorbitant rates to access. But that’s not the point. Perhaps I’m ascribing too much of my own thoughts to other people’s motivations, but to me the beauty of a regime that requires copyright permission for training generative AI is the sheer impossibility of tracking down every author of unremarkable, low-quality, amateur content effectively makes any commercial application of generative AI impossible. This is a feature, not a bug: you can’t generate low-quality thoughts created by putting a bunch of words in a blender and hitting puree if it’s impossible to figure out where to buy the ingredients for the smoothie.
From the perspective of someone like me, who views generative AI as a vehicle for enshittification and making the signal-to-noise ratio of information on the web even worse than it already is, nothing could be more delicious.