Publishing A Book Means No Longer Having Control Over How Others Feel About It, Or How They’re Inspired By It. And That Includes AI.
from the we-need-to-learn-to-let-go dept
There’s no way to write this article without some people yelling angrily at me, so I’m just going to highlight that point up front: many, many people are going to disagree with this article, and I’m going to get called all sorts of names. I actually avoided commenting on this topic because I wasn’t sure it was worth the hassle. But I do think it’s important to discuss and I’ve now had two separate conversations with authors saying they agree with my stance on this, but are afraid of saying so publicly.
I completely understand why some authors are extremely upset about finding out that their works were used to train AI. It feels wrong. It feels exploitive. (I do not understand their lawsuits, because I think they’re very much confused about how copyright law works. )
But, to me, many of the complaints about this amount to a similar discussion to ones we’ve had in the past, regarding concerns about if works were released without copyright, what would happen if someone “bad” reused them. This sort of thought experiment is silly, because once a work is released and enters the messy real world, it’s entirely possible for things to happen that the original creator disagrees with or hates. Someone can interpret the work in ridiculous ways. Or it can inspire bad people to do bad things. Or any of a long list of other possibilities.
The original author has the right to speak up about the bad things, or to denounce the bad people, but the simple fact is that once you’ve released a work into the world, the original author no longer has control over how that work is used and interpreted by the world. Releasing a work into the world is an act of losing control over that work and what others can do in response to it. Or how or why others are inspired by it.
But, when it comes to the AI fights, many are insisting that they want to do exactly that around AI, and much of this came to a head recently when The Atlantic released a tool that allowed anyone to search to see which authors were included in the Books3 dataset (one of multiple collections of books that have been used to train AI). This lead to a lot of people (both authors and non-authors) screaming about the evils of AI, and about how wrong it was that such books were included.
But, again, that’s the nature of releasing a work to the public. People read it. Machines might also read it. And they might use what they learn in that work to do something else. And you might like that and you might not, but it’s not really your call.
That’s why I was happy to see Ian Bogost publish an article explaining why he’s happy that his books were found in Books3, saying what those two other authors I spoke to wouldn’t say publicly. Ian is getting screamed at all over social media for this article, with most of it apparently based on the title and not on the substance. But it’s worth reading.
Whether or not Meta’s behavior amounts to infringement is a matter for the courts to decide. Permission is a different matter. One of the facts (and pleasures) of authorship is that one’s work will be used in unpredictable ways. The philosopher Jacques Derrida liked to talk about “dissemination,” which I take to mean that, like a plant releasing its seed, an author separates from their published work. Their readers (or viewers, or listeners) not only can but must make sense of that work in different contexts. A retiree cracks a Haruki Murakami novel recommended by a grandchild. A high-school kid skims Shakespeare for a class. My mother’s tree trimmer reads my book on play at her suggestion. A lack of permission underlies all of these uses, as it underlies influence in general: When successful, art exceeds its creator’s plans.
But internet culture recasts permission as a moral right. Many authors are online, and they can tell you if and when you’re wrong about their work. Also online are swarms of fans who will evangelize their received ideas of what a book, a movie, or an album really means and snuff out the “wrong” accounts. The Books3 imbroglio reflects the same impulse to believe that some interpretations of a work are out of bounds.
Perhaps Meta is an unappealing reader. Perhaps chopping prose into tokens is not how I would like to be read. But then, who am I to say what my work is good for, how it might benefit someone—even a near-trillion-dollar company? To bemoan this one unexpected use for my writing is to undermine all of the other unexpected uses for it. Speaking as a writer, that makes me feel bad.
More importantly, Bogost notes that the entire point of Books3 originally was to make sure that AI wasn’t just controlled by corporate juggernauts:
The Books3 database was itself uploaded in resistance to the corporate juggernauts. The person who first posted the repository has described it as the only way for open-source, grassroots AI projects to compete with huge commercial enterprises. He was trying to return some control of the future to ordinary people, including book authors. In the meantime, Meta contends that the next generation of its AI model—which may or may not still include Books3 in its training data—is “free for research and commercial use,” a statement that demands scrutiny but also complicates this saga. So does the fact that hours after The Atlantic published a search tool for Books3, one writer distributed a link that allows you to access the feature without subscribing to this magazine. In other words: a free way for people to be outraged about people getting writers’ work for free.
I’m not sure what I make of all this, as a citizen of the future no less than as a book author. Theft is an original sin of the internet. Sometimes we call it piracy (when software is uploaded to USENET, or books to Books3); other times it’s seen as innovation (when Google processed and indexed the entire internet without permission) or even liberation. AI merely iterates this ambiguity. I’m having trouble drawing any novel or definitive conclusions about the Books3 story based on the day-old knowledge that some of my writing, along with trillions more chunks of words from, perhaps, Amazon reviews and Reddit grouses, have made their way into an AI training set.
I get that it feels bad that your works are being used in ways you disapprove of, but that is the nature of releasing something into the world. And the underlying point of the Books3 database is to spread access to information to everyone. And that’s a good thing that should be supported, in the nature of folks like Aaron Swartz.
It’s the same reason why, even as lots of news sites are proactively blocking AI scanning bots, I’m actually hoping that more of them will scan and use Techdirt’s words to do more and to be better. The more information shared, the more we can do with it, and that’s a good thing.
I understand the underlying concerns, but that’s just part of what happens when you release a work to the world. Part of releasing something into the world is coming to terms with the fact that you no longer own how people will read it or be inspired by it, or what lessons they will take from it.
Filed Under: ai, authors, control, copyright, ian bogost, moral rights
Comments on “Publishing A Book Means No Longer Having Control Over How Others Feel About It, Or How They’re Inspired By It. And That Includes AI.”
While creators feel that their works are their children, they should also remember that when children go out into the world they live their own lives. That is you nurture and guide them into become independent of you, and maybe they then go off in ways you do not understand.
I completely agree with this
It seems to me the whole purpose of publishing your work is to make it available to the world to consume. Whether it’s a textbook or a romance novel, you want people to read it and talk about it.
If you love something, let it go…
Re:
Goes for everything one puts out onto the world, doesn’t it, including these comments (7:30am meta alert!) I’ve tried to put into the public domain as much code I’ve written as possible for the reason that I know that it’s entirely likely that I don’t even know the different ways my code can be used and the best way it could be used, transformed, and adapted in ways I can never imagine on my own.
I hope that authors or anyone who creates original works for public dissemination would realize that they are not writing dogma, and they are not creating religious texts but rather, works that by their nature cannot be treated as some holy text but rather, foundations upon which others will build upon or refute or transform or mash-up or sample as they see fit. No man is an island whether in their knowledge, the context of their writing, and the manner and view they espouse which then goes into the works they write. In that sense and in a way that certainly most writers would understand to some degree, I think T.S. Eliot was prescient (as others have noted in the 100 years hence) when he wrote, in regards to the notion of tradition, that
The realization that the future transformation of how a work is interpreted will also have an inherent impact on how the original work is read and understood is recognized by Eliot who was keenly aware of the fact that even his works would and be read in ways that he did not intend and had little power to stop it. His copious but frequently cheeky and sometimes tongue-in-cheek ur-trolling in his footnotes are indicators of his ambivalent attitudes towards dogmatic interpretations. He then proceeds to talk about how it’s less about each individual who works with this aggregated knowledge and experience but the quality of the aggregated knowledge itself that matters far more.
I think this indirectly outlined the ethos upon which hip-hop and open source software development was built and cannot exist without the basic notion that it’s less about you but the work itself. I don’t think Eliot himself could possibly see how his essay on literary criticism would be read as such, but it demonstrates the point, I think. The critiques of this essay seems to lean heavily on the notion that the writer must be depersonalized in his passing of said tradition when writing, but it ignores that the writer’s depersonalization is inherently a sort of self-depersonalization, and may be better read as one cannot and should not intentionally make what can only be the accumulated experience and knowledge into a canonical, definitive culmination of the inspirations that leads to an immutable work.
(Also, with great irony I feel compelled to point out that ‘Eurocentricity’ critique of the essay was written very much with a distinct ‘white man’s burden’ tone, as if those of us who did not learn English as a first language and were educated, partially or fully, in traditions that had no direct contact with the western canon, need someone to white knight for us when in truth, I’ve viewed that the over-reliance on the western white-knight mentality undergirds the dogma upon which the CCP and its historiography is constructed. I’m no longer in academia and when I left this was not fleshed out fully yet but by this point it should be clear that even the need to distinguish “socialism” with “socialism with Chinese characteristics” is both an endorsement of the transformative nature of idea and thought as aggregation and a rejection that western white-knighting is necessary and the most essentialist notion is also something inherently foreign in the context. It’s not something that I can even say in China, but the mountains are tall and the emperor is Winnie the Pooh.
“The more information shared, the more we can do with it, and that’s a good thing.”
The fundamental principle of the Internet.
You may get yelled at, but that won’t change the reality that once information is public, it’s beyond the creators control.
The problem is that being an author pays a few people a tremendous amount of money, while most people who write for a living can’t really live on what they make. That’s the primary problem. I don’t know how to fix that. Copyright does a poor job at it, but there really isn’t anything else so it’s no surprise that many writers and other creatives latch onto it as the only way they can survive.
Re:
It was never the intention of copyright to allow people to “live on what they make.” Copyright was just to incentivize creation. It demonstrably does that just fine, because more people are writing and publishing books today (both overall and per capita) than at any time in history.
Clearly there’s no “problem” with the system, as far as incentivization goes.
Re: Re:
I would say it is to incentivize creation and publication. And bear in mind, it’s not the only incentive and often not even the biggest.
If there is a problem it is that we offer people copyrights as an incentive when they’re unnecessary because the authors were already incentivized enough without copyright.
Re: Re: Re:
Copyright never incentivized creation, as the publishers, labels and studios only ever accepted a tiny fraction of the created works offered to them. Its was created by publishers, with the spin of creating an authors right, as a means to replace the old copyright for regulating the printing industry. Also, if you think money is the primary reason for creating things, you have not been very observant of what you see on the Internet.
Re:
Anybody who latches onto copyright as a means of making a living are missing the real source of their income, and that is fans who will support them regardless of copyright, or whether the work is available for free. Copyright was designed as a means to give publishers control over the production of physical copies, where they had to print many copies in the hope that they could sell most of them, and where pirate copies could leave them with a pile of unsold books. Until recent times print on demand was not possible, and print runs in the thousands to tens of thousands was the way copies had to be made. That also applied to Vinyl, CDs and DVDs.
Re:
So, publishers.
You know who also can’t live off what they make? EVERYONE FUCKING ELSE. WELCOME TO REALITY.
No one is making enough to pay their bills, let alone CREATE.
What, and ignore the Internet, learning how to host your own WordPress blog, set up a Paypal Business account, and run a business?
People do fanfiction for free, you know. Oh, and a lot of people write for free too. Ever heard of the Stormlight Archive franchise?
Re: Re:
Speak for yourself, loser.
Re: Re: Re:
If you have a job that does, good for you.
If you did c9me here to gloat and say I’m not trying hard enough, though, kindly take a short walk off a tall cliff, or, yanno, walk into one of the impoverished neighborhoods in America. You know, the one with all the crimes.
No, I’m not asking you to go neck yourself, because that’s against 1A, shockingly enough.
But do realize that people gotta paytheir bills first before they can even co sider a hobby or be creative.
If you can’t and want to cling onto your asinine beliefs of self-reliance and personal growth though…
Jan 6 happened. And you probably support it.
Re: Re:
…and if people think this a new thing with the internet, I’m not sure which bridge to try and sell them.
Re:
It’s quite simple – producing a work of art doesn’t mean you will ever get paid for it, unless you have a prior agreement. Copyright is an imperfect attempt to prevent other people from profiting from the work at the expense of the original artist, but there’s no time in history where an artist was guaranteed an income.
The entire premise of copyright to begin with is that you can’t own ideas or words. They belong to society as a whole as soon as they’re shared. Government also recognized (at least they used to recognize) that people tend to jealousy guard ideas and innovation if they spend a lot of time and energy creating those ideas, because they fear they will lose time/money/credit etc. To make them more likely to share, the government gave sole right to control the money and republication of their ideas for a set time, understanding, “hey, we know this took time and effort, so you can profit from your ideas for a while in exchange for bequeathing them to society for the ultimate benefit of all.”
Somewhere along the way, that whole concept was lost. Now it’s all intellectual property. Creators now believe that they do own the ideas, and more, should have the right to form multigenerational baronies off the ownership.
And that is the source of all this. If society and creators still understood that no one owns an idea, they wouldn’t seem to feel entitled to deny access their ideas the same way landowners can deny trespassers.
“I want my book out there, influencing the world (but I don’t want it to have any influence on anything without my permission).”
AI does what humans do in synthesizing *new* content, but faster and cheaper
Authors have some legit beefs with AI when it actually copies them word for word, but it is rarely asked to do that. Instead, it does what people do in creating entirely new content based on reading content – that is what humans do.
So many people refused to understand how AI works and persist in false copyright infringement claims about it.
MIKE YOU BIG DUMB POOPY-HEAD! HOW DARE YOU OFFER AN OPINION ON SOMETHING THAT I DISGREE WITH!
(Actually Reads Article)
Oh, that’s quite reasonable. Please disregarding previous angry yelling and name calling.
Indexing for AI or Searching is No Different
The phrase “when Google processed and indexed the entire internet without permission” caught my attention. Technically, parsing text found on the internet and “copying” it into a token database that will be used by a search algorithm is no different than parsing and copying that same text into a token database that will be used by an AI algorithm. But somehow there is a gut reaction that use of your tokenized work by an AI algorithm just “feels” unethical?
Re:
I think that’s because searching benefits the author. It makes it easier to credit phrases or styles to a particular book/author.
AI makes it harder to credit that person with releasing it. Even if the act of being stored in a database is the same, the outcome appears very different from google’s usage.
Re: Re:
To be fair, a lot of authors REALLY disagreed with that, leading to a series of very long lawsuits against book scanning (which eventually went against the authors).
Re:
“The phrase “when Google processed and indexed the entire internet without permission” caught my attention”
Me too, for 2 reasons.
First is that Google doesn’t index the entire internet – they honour various things such as robots.txt, which ensure that they don’t index something.
The other thing is that they only index whatever’s made available to the public. There’s easy ways to prevent that. The problem is people wanting the benefits of indexing, but also wanting to be paid for it.
I would expect that if any of the defendants of these lawsuits can find in discovery that the books3 data set was explicitly seek out to train their models, or was used for new training after receiving a C&D notice from an author or publisher with material that was made available in Books3, that those defendants would be in a weak position in a copyright infringement case regardless of if training the model itself ends of being fair use.
Re:
Thinking about that, Google and Microsoft in particular almost certainly would have been sent takedown notices about their indexing of Books3 from publishers for Google Search and Bing respectively, so it would be hard for them to argue that they were unaware of the infringing content at least at the original source if they used that for any recent training of their AI models.
The problem here is that you’re begging the question by assuming that a human reading a work is no different from a machine processing the same work as training for a ML model.
The obvious counter is that these two things are not necessarily the same. As a society, we’re absolutely free to decide that use of a copyrighted work in the training of a commercial product is not the same as a human reading and learning from that same work.
No need for any angry yelling whatsoever.
Re:
I guess my concern is… how? How do you do that in a manner that doesn’t wipe out reading a book, or scanning it for other useful purposes. I just don’t see it. Every solution would greatly diminish other useful things as well.
Re: Re:
You do that by recognizing that humans reading books and scanning portions of it and getting inspired by their learnings and how those learnings commingle with those people’s individual personal experiences is different from how books are shoveled into these generative models.
Humans are different from machines and we can make laws and reach case law and decisions where authors and artists are able to require permission and/or compensation before their work goes into a generative model without even touching on the risk that people will face infringement charges for getting inspired by reading books or scanning some pages here or there of them at the library, or reading some scanned and copied passages in class.
Re: Re: Re:
You do that by recognizing that humans reading books and scanning portions of it and getting inspired by their learnings and how those learnings commingle with those people’s individual personal experiences is different from how books are shoveled into these generative models.
And again how is that different? What’s the difference between me reading an author’s books to polish up on proper spelling and grammar and perhaps picking up a few of their ‘quirks’ that bleed through into my own works and an AI doing it?
Just saying ‘it’s a machine, that’s different’ does not cut it, if you’re going to make your entire argument hinge on that fact you’ll need to spell out precisely why a person reading a book to learn and fine tune their skills is fine while an AI reading a book to learn and fine tune their skills is not.
Re: Re: Re:2
The difference is that you’re not a corporate machine product built by surveillance capitalists (in this case, Meta) in order to increase shareholder value and profit and devalue human workers.
As humans, we actually read books. We interpret them differently based on our prior life experiences that can go back years or decades. What we glean from them and then take with us into the future is far more than “spelling and grammar and perhaps picking up a few of their ‘quirks’”. To think that that’s all we get from books is diminishing of what humanity is.
Re: Re: Re:3
The difference is that you’re not a corporate machine product built by surveillance capitalists (in this case, Meta) in order to increase shareholder value and profit and devalue human workers.
None of that is relevant unless you believe that me reading books to develop my skills becomes a problem should I take the resulting skill and monetize it(perhaps by gasp selling my stuff to a major company), something which would condemn the entire field of literature other than those writing purely for non-profit reasons.
‘I don’t like Meta’ and/or ‘This might negatively impact human writers’ ability to make money’ does not a valid anti-AI argument make and more than ‘I don’t like Benz & Cie’ and/or ‘This might negatively impact horse-drawn buggy drivers’ ability to make money’ would have been a valid anti-automobile argument.
As humans, we actually read books. We interpret them differently based on our prior life experiences that can go back years or decades. What we glean from them and then take with us into the future is far more than “spelling and grammar and perhaps picking up a few of their ‘quirks’”. To think that that’s all we get from books is diminishing of what humanity is.
And again irrelevant, that AI doesn’t get as much from ‘reading’ a book does not an anti-AI argument make as by that standard someone differently abled(I believe that’s the proper terminology) as far as mental capacity goes who wasn’t capable of getting as much from reading a book would be equally barred from doing so.
Re: Re: Re:4
And yet there was talk of Ford buying streetcar companies to basically defund them. If you think I’m going on a tangent, then I think we have to admit to talking past each other.
To be clear, I’m no Luddite. Just block book titles from the bar, problem solved. Probably a waste to processing power, anyway.
Neurodivergent or not, people’s life experiences are not books. Speaking as one, especially as one who’s been to art galleries, this line of argument bothers me so much because I know that humans can make art inspired by things other than other art pieces. It looks impossible to me for AI to not do that, short of it actually becoming human-like.
Re: Re: Re:5
And yet there was talk of Ford buying streetcar companies to basically defund them. If you think I’m going on a tangent, then I think we have to admit to talking past each other.
Yeah I’ve got no idea how that’s a response to my point so talking past each other seems to be apt there.
Neurodivergent or not, people’s life experiences are not books. Speaking as one, especially as one who’s been to art galleries, this line of argument bothers me so much because I know that humans can make art inspired by things other than other art pieces. It looks impossible to me for AI to not do that, short of it actually becoming human-like.
Running with the art comparison at the current stage AI is stuck with the small selection of paint colors it’s picking up in it’s training and lacks the ability to choose to mix them to make any ‘new’ colors when people ask for a particular ‘picture’ but I’m still not seeing that as a reason why it needs to be treated as inherently different than a human creator as far as learning the craft goes.
Re: Re: Re:3
This issue of supposed corporate dominance in machine learning is completely distinct from issues of existing copyright law. Expanding copyright law is the something I don’t want the US or any country to do. (Well, I would accept “expansions” of copyright law only if those expansions would solely expand copyleft and/or free culture.) Actually, my proposed solution to reducing and preventing corporations from dominating the machine learning domain would be for the US to pass a law mandating that all machine learning outputs and derivatives of such outputs be under public copyleft, thereby making it legal for the general public to use any ML outputs and derivative works that ML model users (including but not limited to corporations) publish in any capacity.
Which is irrelevant, because what matters are the human who trained the machine learning model and any human who does something using the output of the model. The issue of machine learning is not human reading vs machine reading, but human writing after reading vs human writing after prompting a machine which read. (I made a super verbose version of this argument in a different comment.)
Re: Re: Re:2
The difference is that “Humans/Organics Are Special” and that’s all it really comes down to. There isn’t any real logical argument to it.
Think about it – if instead of “machines” in this argument, what if we were talking about a biological non-human entity that was capable of perfect recall? If they wouldn’t be violating copyright by reading a book, then why would a technological non-human entity capable of perfect recall be in violation?
And if they would be in violation, why? Because they can do something humans can’t?
Re: Re: Re:
How to miss the point. There are things that one can do with scanned books that benefit society, or advance human knowledge. For a start human knowledge in many fields has become too large for individuals to keep track of it all, and an A1 could do a better job than indexes at finding useful relevant paper for researchers in a field, especially as it can cat a wider net.
Re: Re: Re:
The difference between having a human read books and having a machine learning model read books in a training set is not meaningful because in the end what makes the reading meaningful is what humans do after the books have been read.
If the reader is a human, the human can use knowledge and perspectives they gain from reading the books for purposes unrelated or related to writing. If the “reader” is a machine learning model, the model itself does nothing, but a human can use the knowledge and perspectives they gain by giving the model prompts about things unrelated or related to writing. The machine learning model is a tool which when used conveys both similar benefits and different benefits from reading books directly.
Should the US pass a law to make it illegal to train a machine learning model on books without negotiating permission and compensation with artists? My answer is no, because it should not be illegal to train an ML model on all of the books in a public library. Or at the very least, I believe that such a law, if passed, would be almost completely distinct from existing copyright law, and therefore would be an unprecedented and massive expansion of what copyright means according to the Copyright Clause in the US Constitution and the First Amendment. After all, if a human uses a model trained on books to create a new product, then the a law forcing compensation of authors of books in the training set would restrict the original expression that the model user put in the new product. US copyright law does not place weight on “sweat of the brow” alone, but on the expression in the work being evaluated. A new book created by a human using an LLM would be predominantly composed of that human’s expression. Therefore, the authors of the books in the LLM’s training set are not entitled to compensation by default any more than they would be had the author of the new book not used an LLM in the creation process.
You might ask, what if the human who trained the machine learning model didn’t get all of the books through legal methods? In that case, the machine learning model’s presence and role are completely irrelevant as far as copyright law is concerned, because what was illegal was how the human creator of the model obtained the books in the training set. If the books were obtained legally, then questions of compensation would be relevant only if the human prompting the model were to write the new book whose contents in isolation would infringe on an older book. Whether the creation process of the new book involved using a model doesn’t matter.
(Tangent since the output of a machine learning model is probabilistic, I think that ML output in isolation should be uncopyrightable and should be automatically considered part of the public domain.)
Re: Re: Re:
OK. How do you allow humans to read the work online but not the models?
In this case, only in two ways – the number of things they can read in a certain period of time, and how quickly they can output something inspired by them. How do you control this?
Re: Re:
Expanding licensing similar to what we do for things like say, movies based on books, seems like it’d be basically fine? Those licensing laws don’t mess with things like scanning or reading, for the most part.
Our current copyright laws are pretty messed up, but the underlying concept of copyright seems mostly fine. And it’s pretty easy to port it over almost 1:1.
Re: Re: Re:
Two problems:
1) Where thousands of works are used to build a model, the value of any individual work to the builder will be very small.
2) That will restrict the use of works to large corporations, as they have the administrative staff to deal with licensing.
Re: Re: Re:2
Those are legitimate hurdles, but they don’t seem insurmountable.
1 is unfortunate, but at the end of the day, if you’re only a small contribution, maybe a small payout is ok?
2 you’d probably have to centralize licensing in some way. Perhaps the aggregator (in this case Book3) would be responsible for coordinating it, when they build the data set? Or put it onto agencies/publishers
But it doesn’t seem too different than say Spotify/Pandora/YT type models. Many small artists, bundled, each getting a small part of the pie
Re: Re: Re:3
The biggest issue I take with that, is unless it is a mandatory license for all books for a single fee, only a corporation will be able to afford the administrative staff to deal with thousands of licenses. Oh, if you think a collection agency is a good idea, look at how the music collection agencies work, the money goes to the labels of the best selling artists, and none to those with lower sales volumes. By the time the agencies and labels take their cut, and subtract administrative fees, very little actually ends up in the pockets of artists.
While all the noise is being made by a few authors, there are many thousands who who are silent, or are you going to cut the self publishing authors out of any licensing deal. If the payment is small, who pays the administrative bill that could well exceed any viable licensing fee, unless only a very limited set of ‘big name’ authors get any benefit from licensing.
Re: Re: Re:4
I don’t think the music industry is in a good place, but it seems better than well.. zero dollars?
In the worst case scenario, it’s not worth dealing with the hassle, and the author is basically back to where we are now (arguably better. You’re still getting $0, but at least you’re not also simultaneously training your potential replacement). The current usage is basically equivalent to what it would look like if the administrative fee ate up the entire payment to the author.
While I do think it’s a situation that’s open to abuse, I don’t think it’s guaranteed (especially if we actually do anything as a society about oligopolistic industries like record labels). And there is some room for hope, because authors will have more leverage- this isn’t the only form of income for them, whereas a music artist pretty much has to go through record labels to get anywhere. And that’s neverminding that Spotify has extra moats (in terms of network effects) and extra challenges (it has to go through music labels, which eats most of it’s margin even if it wanted to be more generous). It’s not clear these datasets would have the same type of moats/challenges.
And if it goes through say, publishers, they’re not at the mercy of publishers anymore than they already are.
There is at least some potential for upside, although as with anything it’ll be at risk of regulatory capture and the like
Re: Re: Re:5
Do you realize that the music industries public performance licenses are a way of transferring money from the musicians that play the pubs and clubs etc. to the pockets of the labels and ‘big stars’.
Licensing, when the are hundreds of thousands of creators involved, many of whom self publish, or are you forgetting that it is images as well as the written word that goes into training AI,which ensures that it would be a ‘public performance’ style licensing benefiting very few creators, and probably only those with contracts with publishers.
Re: Re: Re:
Not really. For example, the recent movie The Creator is an amazing piece of work, especially for its budget. But, one of the biggest criticisms is that its story and overall plotlines are derivative of what came before.
Should Gareth Edwards be paying James Cameron because people drew comparisons to Avatar? If so, ho much does he owe for the parallels drawn to Ferngully?
Re: Re: Re:
Not possible, as a movie is a derivative work of one, or occasionally a series of books. A.I is not clearly a derivative work, and its inputs are thousands of books. So licensing that works for I want to make x based on a particular work does not work when thousands of works are analyses to extract a statistical model.
Re:
I don’t see how his argument hinges on eliding a distinction between human and machine. The relevant thing is that the work was released to the public.
Re:
But it is not yet decided if a human reading a book is different from a ai training.
It does not really matter what society has decided its what the courts decide. 10,000 authors and 10 million readers, the machines owe us. The court, following the predecent in the google search case. the books have not been copied, case dismissed.
Re:
1) true, they’re not exactly the same,
but
2) the difference doesn’t actually make any difference to the validity of the argument.
Frankly, I’m okay with the machine-learning aspect of these LLMs. Those are good. Those could lead to better spellcheckers and improve certain workflows in many, many industries.
Not so much the executives trying to use procedural content generators to replace human work or worse, justifying the cheapening of the process because “humans are a liability”.
Fucking Line Must Go Up mentality has to stop.
This is very fact specific
I have written a lot of books and back in the day made a lot of money from them. I agree that it is fine for AI models to chop them up and use them for training data. The Google Books case should have settled that, since one of the reasons the court found for Google is that they used the scanned books to produce things unlike copies of books, an index with snippets, interesting research like tracking the usage of words over the years.
The problem is that LLMs cannot tell you where their results come from, and if they start regurgitating recognizable chunks of my books, I would not be happy. (I would be differently unhappy if they invented nonsense quotes and claimed I wrote them.)
The Getty Images suit against Stability AI shows examples of generated images with recognizable Getty watermarks, so I have a lot of sympathy for Getty there. If LLMs could show their work, so we could tell the difference between research and cut’n’paste, these arguments would be a lot easier to resolve.
Re:
That would usually be in the training data.
And that’s the problem, these for-profit businesses are… using the products of their research arms to generate profit. Directly and not through copyrights or patents.
Re:
They’re not really regurgitating works at all, though. That’s now how they work. They’re just creating probabilistic engines suggesting which words they think fit next that would meet the prompt.
I think that’s a misinterpretation. What Stability was doing there was not regurgitating, but rather because it was trained on so many images with the watermark, it thinks that watermarks are things that show up in lots of photos, so it might just put it there as probabilistically making sense to be there, and not using something it copied.
Re: Re:
I mean, I think you could argue that probabilistic regurgitation is still a form of regurgitation. Does it only count as regurgitation if it’s 1:1? 1:1 is the most extreme form (where the probability space only has 1 point), but it’s performing the same style of function.
To me, I think what fundamentally separates regurgitation is whether the input is changed, or added to, by the thing itself. And when I say added to, I mean beyond mixing 2 or more inputs with each other. AI models don’t do that (right now). It’s not the probabilistic part, or whether it’s 1:1 that matters. If all you’re doing is picking between a probability space built up of inputs, that’s regurgitation, because it’s completely reliant on the inputs. In some sense, there’s nothing “new” being added, you’re just picking (on a very fine, mathematical scale) what already exists.
“might just put it there as probabilistically making sense to be there” seems like just a rephrase of regurgitating, no? The definition of regurtitation is literally
“repeat (information) without analyzing or comprehending it.”
And that’s kind of exactly why they pop up- the AI isn’t comprehending it. It’s analyzing it in a mathematical sense, but not in the sense of whether it should be a part of the image or not. There’s no analysis beyond how often does this show up in the training set.
Re: Re: Re:
In second paragraph that Mike wrote in his previous comment, Mike’s point was that the presence of the watermark in the machine learning model’s output image does not indicate whether the output image is remotely similar to any of the images in the training set. In other words, the presence of the watermark is not a smoking gun as far as whether the output image is a copyright infringement.
Actually, I used an incorrect framing in the previous sentence. What matters is what the humans do, so an output image is not a copyright infringement for merely existing. The infringement would follow from what the human prompter does with the output (e.g. shares the output image so that other humans can see it). Did the creator of the ML model obtain the input images using an illegal method? That’s distinct from what the humans on the other end of the model do. The model itself is irrelevant.
Whether the ML model regurgitates something or not is irrelevant. What if an ML model regurgitates a (flawed or unflawed) copy of one of the works in the training set? 1. Well, there’s no infringement if the human prompter ignores it. 2. If the human merely publishes the raw output, then that would be infringement by the human the same way it would be infringement had the human manually typed out the same output by chance without using the model at all. 3. And if instead the human uses the output to create a new work, then the new work may or may not be infringing based on the contents of the new work in isolation.
If the model doesn’t regurgitate a work in the training set, then everything’s well and good. Only case number 3 from my previous paragraph would apply.
Re: Re: Regurgiwhatever
LLMs do indeed give you a token string, but it’s what comes out that matters. You probably remember plagiarism court cases where a composer did not remember listening to the song he copied, but it was clear he did so he lost. Similarly, it doesn’t matter how Stability is generating Getty trademarks, just that they were in the training data and they show up in the output.
I expect that now that the LLM crowd is realizing that it’s a problem that they can’t trace input to output, they’ll at least try to fix it.
Re: Re: Re:
The fact that the training set contains certain images does not imply that what comes out is noticeably similar to the input images. The most you can say for sure is that any output image containing an approximation of the watermark infringes on the copyright on the watermark (which usually is not what people think about).
For what it’s worth coming from a layman, I don’t think there is a definite way to establish direct links from input to output for a machine learning model, whose outputs are probabilistic and based on weights which approximate an entire training set as opposed to particular images.
As a fan of copyleft licenses, I would love for ML model makers to come close though. Using another model to identify similarities between outputs and training set images, then letting humans check the comparisons, would go a long way.
Re: Re: Re:2
I formatted my quotes wrong. (Sorry, I clicked the submit button when I meant to click the preview button.) The excerpts I quoted from spamvictim’s previous comment should look like this:
Re: Re:
Ask a large generative AI model for the first sentence of A Tale of Two Cities. You’ll get the first sentence of the book. I just tried this with Claude and got all 120 words. Some models don’t give you all of it, but some do.
It’s not clear how much of the book the model can reproduce, or how many books are popular enough to be output verbatim this way. For some books, you can ask for the next sentence and keep getting actual book text. Eventually, you do start getting generated text, which would probably make some authors unhappy. If output starts with your work and then veers off into hallucination/confabulation, it’s not great.
Re: Re: Re:
… you’re asking it to quote a book and you’re surprised that it’s quoting the book?
Re: Re: Re: People can do this too... So what?
“It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of someone or other of their daughters.”
(Pride and Prejudice)
Sorry, I’m just quoting a novel’s famous opening lines — what I just did here is actually much closer to plagiarism than what an LLM AI does.
Re: Re: Re:
” If output starts with your work and then veers off into hallucination/confabulation, it’s not great.”
Or it’s fanfiction if a human does it. In any event, you can still call it fair use because it’s not using the author’s actual text beyond the first couple of sentences.
Re: Re: Re:
One of Stephen King’s magnum opuses (yes, he has more than one), is the Dark Tower series. The entire series is inspired by a poem by Robert Browning, titled “Childe Roland to the Dark Tower Came” and the sentence “The man in black fled across the desert, and the gunslinger followed.” set up several decades of creative work.
The later books have been criticised for the way they went toward the story’s conclusion, especially as he incorporated many real life events following his accident into the story, and a lot of his output in that time period can be traced to the pain drugs he was taking (he apparently doesn’t even remember writing Dreamcatcher, written at the same time). But, there’s no doubt that at least the first 4 books of the series stemmed directly from that poem and I doubt that an AI would write the same thing just because.
So, do you think that King owes the estate of a long-dead author something for taking his direct inspiration, or does that only apply when the later books went in a different direction? If you don’t agree with those things, what specifically separates King from AI in terms of what you’re thinking of?
Also, if you’re going to test supposed problems with ML written output, it might be best not to use a very famous book that’s in the public domain as your starting point. That may skew the results.
Re: Re: Re:2
Oh, and maybe another better example – Fifty Shades Of Gray. Quite famously, that series started as Twilight fan fiction, but the author rewrote it to remove those references in order to get it published.
How do you allow that series to exist with its own “original” status, but still claim that an AI trained on the Twilight books can’t count because they took from that source?
I’m not saying there’s a right answer here, but ultimately, humans are “trained” on copyrighted works too. There’s a problem with the speed and volume in which these models can churn out something, but it’s problematic to say the least to say that “training” on copyrighted work attracts an automatic fee.
Also, if you’re using a public domain work like A Tale Of Two Cities that’s been legally copied and riffed on for many years as your starting point to “prove” that Ai will just copy, especially if you only use 120 words which can be legitimately quoted by new authors… that’s not a great example.
We’re in an age where lazy and corrupt people will use the tech for bad things, as every new tech has been used before it. I just don’t think this is the right argument against it.
Re: Re:
Actually, I’ve been in enough arguments (especially but not necessarily on the Internet) to conclude that in truth, a surprisingly large proportion of people construct their “arguments” in essentially the same manner as an LLM AI does.
Re:
The AI treats the watermark exactly the same as a face or any other common features in images. Pointing to the watermark as some gotcha is the same as pointing to other common features in images being reproduced by an AI as some kind of proof of infringement. Suing Stability AI for producing portraits of humans with correct skin-tones because Getty have images with humans makes just as much sense.
Re:
As the models do not store recognizable chunks of any work,it cannot be cut and paste. The Getty water mark was a case of a model recognizing something went there in many images, but being a stupid machine not recognizing that it was an added mark to claim ownership.
If you are an author and look in the Book3 list and have a fit that your work is being used, would you have an issue with your work being used in a literature class for students to break down and interpret or learn a writing convention from? If you have no issue with people using your work to learn from, why do you have an issue when a LLM uses it to learn from?
Re:
I am guessing the majority of authors would have a problem with their books being used in classrooms without compensation, judging the consistent battles over the prices of textbooks.
Re: Re:
I have news for you, writing text books is a poorly paid occupation, while while selling them to students is highly profitable. So it is publishers who raise a stink about free use of textbooks, and second hand copies, and grey market copies. The authors see nothing, or very little of the money flooding into academic and related publishing.
Re: classes are not AI
The usual reason is that the students are reading copies of the book that they bought, not ones that they stole.
Re: Re:
Which has nothing to do with the machine learning models themselves. If the creator of the model used illegally obtained books in the training set, then the model creator committed copyright infringement.
I wrote in a [different comment]:
Re: Re: Re:
I messed up making my link. It should be
I wrote in a different comment
Re: Re: "stole"
You keep using that word. It does not mean what you think it means.
I think the biggest thing is how people differ from AI. Both take in and learn from other people’s work. But when a person goes to create something inspired by what they learned, they bring things to it that weren’t part of what they learned from. Creators can respect that, it’s what they did themselves after all. But AI can’t do that. At it’s current state it’s limited to basing what it does only on what it’s learned without being able to go beyond that. I mean, AI currently can’t even take “A equals B” and infer that that must mean “B equals A”. That leaves creators feeling like someone took their book, changed the names of places and characters and published the result as a completely new work. It may be different from the original work, but it’s hardly a creative change or addition.
This isn’t helped by all the idiots doing things with AI like deliberately training it on one creator’s work and using it to come up with crud they can market using the creator’s name and reputation to boost their sales.
Re:
This distinction is untenable. Humans do not have access to ideas they’ve never learned. We’re all limited to basing what we do only on what we’ve learned — our training set.
Re: Re:
We do have access to ideas we’ve never learned. How do you think someone comes up with an idea for the very first time? Even small children can take “A equals B” and come up with “B equals A” without being told as soon as they grasp what “equals” means. AI can’t do that, not at it’s current level. We don’t know how humans do it, but we’re capable of inductive logic where computers aren’t. And I don’t think we’re going to get true AI until we figure out what it is that makes us capable of that.
Re: Re: Re:
Combining and pattern matching what they’ve learned?
Re: Re: Re:2
Add in observation of things, and lots and lots of failed experiments..
Re:
Sometimes. But, not always. There’s some major works of art that have been accused of plagiarism.
True, but there’s been humans doing that as well and they were tolerated.
Re:
My view is that the model itself is irrelevant, because humans are the ones doing things on both sides of the model. As I wrote in a different comment:
Or in summary:
Which is divorced from the issue of whether the ML model creator legally obtained the books in the training set.
let the lawyers squabble over copyright
Here’s my take on AI: it might be a huge boon to show runners and IP owners with followings. I just read an article about Vince Gilligan and his misgivings about AI. My first thought was, what does that guy have to worry about, his fans will follow him anywhere. AI is not going to replace him. People without his creativity and talent might need to worry.
Professionals who use AI to mimic Gilligan’s very popular style might be able to do knockoffs but they’ll be inferior and even if they’re as good or better than his stuff, they still can’t say they’re “Vince Gilligan” so his fans will still follow his shows and not the knockoffs.
But let’s say I’m not a professional and I want to re-do Lord of the Rings, Vince Gilligan style, either because I love both or maybe I hate LOTR and think it needs improvement. Feed that into the AI maw and let it spit out something with the Lord of the Rings actors in a Breaking Bad plotline, with dialogue that’s a mashup of LotR and Breaking Bad.
The result would be a silly mess but whoever makes it and uploads it to YouTube certainly wouldn’t expect to get paid other than YouTube tossing them a tiny cut of the ad revenues. It would be done for notoriety and entertainment value.
It can’t be stopped but far from replacing Gilligan’s work, it would serve as PR for him, keep him top of mind, and help promote his current stuff. I could see a whole series of parodies like this, a la the Downfall videos, mashing up everything you can think of. A small percentage of the mashups will be crazy, funny or good enough to go viral. Showrunners or IP owners with a following can either make use of this to their own advantage or get run over by it.
The copyright lawyers can run around making sure the LOTR/Gilligan knockoff makes no serious money, but they won’t stop it from existing while the marketing side of the business should be encouraging it.
PS I dibs Star Wars crossed with Deadwood.
Re:
Your talk about how people can use AI mashups to their advantage in marketing reminds me way too much of the NFT/Metaverse garbage.
“What if you took Vince Gilligan and Lord Of The Rings and made an AI mashup of them and then Vince Gilligan used it to market himself” sounds like “What if you owned Mario in a Mario Kart NFT game and could sell him to other people” or this cringe-ass “marketing” that Meta and Wendy’s did.
its only infringement in special cases
Training is only infringement if
1. The work was pirated.
or
2. You include a terms of use contract in the sale and have the customer explicitly agree to it. No an implied agreement isnt enough.
This was a really nice and kind of poetic piece.
Most smart writers don’t try to stop piracy but find ways to work around it. Selling books is just no longer a viable business model, so most now sell extensive courses and engage their audiences (or have an AI do it!).
The rulings on search engines and their transformative nature would seem to apply here. I don’t expect the creators to win.
Ol’ Blue Eyes sang it best about this idiotic lawsuit:
https://www.youtube.com/watch?v=7j9-m9Mbv_Q
AI generated content lacks protection
I think it is fine when companies hoover the internet to feed the giant maws of the AI industry (I’m envisioning something like the movie Mortal Engines except for data). However: when you query the AI and receive an answer that you then use in a work, your work should then not qualify for copyright protection, since you plagiarized who knows someone else’s work. I imagine not claiming the copyright would not alleviate the problem.
If you’re writing a research paper, that’s great, but you can’t attribute just the AI, you need to attribute the source for the AI in each case.
Re:
It may be poor citation practice, but it isn’t plagiarism.
Not disagreeing at all
The AI thing sounds an awful lot like the repeating moral panics authors get themselves into over negative reviews, and how they might be able to stop or punish people posting them. As an author, I know how much a bad review stings, and I also learned the answer to that is to stop reading reviews.
There are two prongs to the current (and really annoying) author panic about AI – one is that it could be eating into sales, for which there is no evidence at all. There is a lot of human-made shitty fiction put up for sale, and yet the best AI generated text would only come up to the same level of success as the human garbage. You would sell some of it to some people, but it’s not going to hit any best seller list.
The other is a moral concern that AI is being used for evil purposes and putting people out of work while serving up inferior service to their former clients and customers. The problem with this is that this horse has left the barn, headed for the highway, and been shot by police two states over. It’s just too late to try and haul back source material now, and the efforts to do so look petulant, self-serving, and are illegal.
The arguments and controls over the use AI should concentrate on unethical usage, protection of people’s privacy and prevention of exploitation of artists and actors’ right to a continuing income from the use of their images or images they create. Authors already have protection of their text in the form of copyright. That’s enough.
FWIW, I totally agree. (No screaming required.)
Who has the right?
Publishing a book does not mean anyone has a right to read it without remunerating the owner of the book. It’s interesting that this article nowhere clearly addresses this distinction.
If I publish a book, anyone can read it, or do what they want with it after paying for it or otherwise obtaining a legal right to it
Re:
If I check it out from the library, I can read it for free.
Re:
Even with printed books, the number of readers was several time greater than the number of copies sold. Books were lent to other people, and sold and resold on the second hand market.
It is also worth noting that the most common model for creators to make money on the Internet is to make works available for free, and use the likes of Patreon or Kickstarter so that people can support them in creating the next work.
Re: Re:
“the most common model”
Even if this was true for authors, it’s simply not the most lucrative one.
And as I’ve said ad nauseam here, it’s not a model that suits many authors, who are often reclusive and antisocial in their habits. Begging people who read your work to cough up a buck or a yen if they feel like it and they like you is not only not for everyone, it’s also demeaning as hell.
Authors have a right to sell their product, just as any other creator does, and control the income derived from those sales. Readers are already getting fiction and nonfiction at astonishing low prices through ebooks (and well below cost). Why should they get it for free and force the creator to caper like monkeys to entertain on top of the entertainment their books provide?
Re: Re: Re:
That is their problem, as they will also have problems marketing a book behind a paywall, or even working retailers and/or with a publisher to market their work. Various web sites will also allow them to sell works, but that does not help if they cannot use social media to market their own work, or find and work with a marketing person.
People tend to fixate on the marketing model, when the real problem, regardless of the marketing model, is building the fan base needed to support a creator.
Re: Re: Re:2
“building the fan base needed to support a creator”
Which doesn’t have to be through Patreon or Instagram or Fakebook.
There is a difference between using social media tools with or without publisher supporter to promote sales of books, and using social media to beg for donations. The latter is the one which is only lucrative for a few people who are gifted in working groups. The former makes money for people who are diligent in maintaining their web/media presence and who have actual talent. The former also doesn’t mean a constant inane performance for the crowd chucking coins.
Re: Re: Re:3
I didn’t specify ant particular way of building a fan base, just said that it is a requirement for making a living from creative endeavour. What is very difficult is building a fab base if you start with a pay wall in the way. Further which is most likely to be successful, ‘buy my book’ or ‘read my book, and if you like it send me a few $$’. Indeed the first task for creative people in most fields is to gain a few fans who will start to spread the word. Also, unless they are extremely talented, it will take them several works to achieve a quality that people will pay for, and constructive feedback from a dew people who see potential in their work makes that easier that rejection letter, one star ratings or just being ignored. Pass the quality threshold, and people will ask how they can support a creator.
Oh, the Internet is much more that the big sites, and finding the right group or forum can be the route to success.
It bears repeating: content creators might have earned a lot more sympathy if 1) They hadn’t used copyright as the basis for their legal claims and 2) They didn’t already have a history of consistently denying responsibility when machines did shitty stuff in their favor like ordering unwarranted DMCA takedowns or suing the innocent.
It reminds me of the whole Prosecraft debacle. I believe Books3 was in the background of that outrage, even though the creator didn’t use that database afaik.
Not “used” - stolen
My beef with AI is not how they use the stuff that they have. You’re right. If you steal a book from Barnes & Nobles and then write a review of it, your review is fair use. The problem is the fact that you stole a book from Barnes & Nobles. They are using pirated books. They didn’t pay for the books. Now, it feels silly to argue about the $5-10ish/per book it would’ve cost them to buy all the books they’ve used to train the algorithms, but at least it would have been something. It would have made those companies face a little bit more of the actual cost of the labor used to create all that intellectual property. Back to the Barnes and Noble example: they cleaned out every bookstore in town, and then chopped them up into confetti, threw it into the air, and some of it landed and made sense. The confetti is not the issue. How they got the material to make the confetti is the issue. And we KNOW now it was from pirated sites. But because the books were made of ones and zeros, or because it was really easy to do, or because somebody else stole them first, suddenly, it’s fair use. It literally isn’t; it’s theft! Not everything online is free. Some of it is behind paywalls. Just because you can get around those paywalls doesn’t mean it’s not a crime. And just because what you make with it afterwards is not a crime does not excuse your original crime. Don’t steal all the books. Pay for the books. (Et al) Then turn them into eleven fingered, balenciaga-wearing, poisoned-mushroom, werewolf porn to your heart’s content.
Re:
So here’s a counterpoint… how do you go about proving all that? Proving that specific instance(s) of theft occurred?
I’m not saying it didn’t happen, I’m saying that you’ll need more than “I could have been five to ten dollars richer” as a standard of evidence.
Re: Re:
The onus is on the AI creator to prove they bought legal copies.
Re: Re: Re:
And how would you prove that content was reused from a book, legally obtained or otherwise?
Who owns the copyright to “It was a dark and stormy night”? Who’s owed money if that sentence emerges from an AI? Or another human?
Re:
What if the reviewer sat in B&N and read the book from cover to cover before reviewing it? What if it was borrowed from a friend? What if it was second hand?
All those are completely legal actions which don’t add a penny to the author’s income.
The claim the books were pirated is not proven, or relevant. I’m tired of this argument. It’s ridiculous and makes those making it look stupid and greedy.
Re: Re:
This is absurd posturing, none of those are remotely plausible.
Re: Re:
You’d be right, but don’t count on things changing any time soon.
Content creators are so attached to the idea of copyright law solving every one of their perceived problems, they’re completely incapable of pursuing any other avenue or alternative.
They lived by the sword and they are going to die by it from willingly diving headfirst after it.
Not losing, giving up control
Yes!!
As a vanity-published author, I couldn’t agree more strongly.
I said much the same thing, with more words, in filling out the Copyright Office’s
Comments on Artificial Intelligence and Copyright
[Docket No. 2023-6]
forms.
I feel that a lot of this hoopfuraw is people trying to double-dip, just like the telcos and other big businesses. I disapprove of the practice no matter who is doing it.
tangent, sorry
I’m imagining specialized AI to remove anything that a particular society finds offensive. Sex scene? Zap! Brisk walk!
Re:
It’s more likely to be used to add sex scenes the original author is too squeamish or loath for other reasons to add.
I’d actually be happy to have a machine write the damn things. I hate writing smut. It’s fucking dull, which makes for dull fucking 🙂
I'm in the set!
At least one of my books is in the set, and I feel uber cool right now 😉
I support shorter copyright terms. But that’s not related with my main criticism of generative AI.
My main complaint about generative AI is that the most popular tools don’t quote its sources. This is unlike search engines like Google, Bing or DuckDuckGo, which provide a snippet and a link to the source.
Re:
Do you quote your sources for every bit of writing, and every image that has influenced what you say, or taught you the meanings of words and how to express things?
Re: Re:
I dare NaBUru38 to acknowledge which words and phrases in their ordinary speech were invented by Shakespeare, or originated in the King James’ Bible.
CC vs. Copyright
I’m fine with public-interest organizations feeding whatever material they like into AI, provided that AI will be used for the public welfare, probably by the public at low or no cost, like a library.
However, if someone takes my book and puts it into an AI model for their own private profit, that’s like selling my book in a for-profit bookstore and not giving me a cut.
What the AI will be used for is the key thing to assess. A non-profit public AI organization should have few or no restrictions, like a public library. A for-profit private AI should have to pay creators fairly for the reuse. I shouldn’t have to watch my work be repurposed for someone else’s profit without my consent (which presumably you will have to buy/hire from me).
Re:
And that is what the first sale doctrine allows for second hand books. Just how many times do you expect to be paid for each copy of your book?
Re: Re:
Content creators do seem to forget that bit whenever they set out on arguments like this. They buy into the idea that everything they do should be nickel and dimed like it is for modern social media influencers, and to be fair – that motivation is understandable.
The problem comes when they actually have a sit down and think about how such a scheme might be implemented, or what it means from the consumer’s perspective, or what sorts of legal activity might be adversely affected.
I think there’s plenty of room to establish legal standards maintaining existing human rights and rejecting any concept of LLMs having inspiration- or opinion-based rights.
Proposed, but open to argument: Automated systems should be legally classified as acts of copying, regardless of how much they simulate human learning or interpretation, and limited to activities where direct and cited reproduction is justified by fair use.
Re:
So if I use an automated system that copies words from other books in an order I suggest, what is the legal ramifications of that?
Digital vs Analog
Put data on the internet and anyone (thing) can take it and do whatever they (it) want. Put data in a paperback book (analog) and AI will have no access to it. Someone buys that book. When they’re done with the book they give it away to someone else to read – for free. That person takes the book and scans it to the internet, regardless of legal ramifications. Now analog has become digital. AI takes the author’s work – for free. The point is, why does it matter if AI uses any data in any way? I mean, all data really is are 1s and 0s. That is the dichotomy of digital. Or, should I say commodity? It used to be a book. Now it’s just data. AI is simply an effort to use all that data faster, better, for all humanity, and probably for profit. Analog is just a leftover way of thinking these days.