I'd really like to have some legal clarification of the requirements to be able to claim any copyright over the output of an LLM. I've heard reasonable-sounding claims for several different possibilities:
The output is not copyrightable at all and is effectively public domain
The user prompting the LLM owns the output
The company that trained and fine-tuned the model owns the output (which could then be granted to the user depending on the license)
There are valid arguments to be made for all of these (but my money is on the first one). Most GNU projects I've been following have not been allowing contributions of code generated by an LLM because of this legal uncertainty around the copyrightability of the code.
I'm a little shocked at how the worlds largest software companies (and the smaller one I work at) have adopted prompt-based agentic development with little care about tracking which code was human generated vs LLM-prompted. If the shoe drops and it turns out that none of this LLM output is copyrightable, or only under specific conditions, its going to pretty hard to prove in court which parts you own and which you don't.
As a dev, I have other concerns, primarily that I don't want to my ability to develop software to become dependent on a chokepoint controlled by a handful of billion/trillion dollar software companies. I try to keep my infrastructure and tooling 100% open source because I've seen too many rug pulls in the past.
I think it was Jess that made some interesting comments about how much influence and control over a model's outputs that the individuals controlling the training and fine-tuning of the model have. She seemed to imply that it can be argued that the output of a model can be considered to be the speech of the model's creator (she did caveat that there were of course nuances). If the output of the model is their speech, would that then mean that they have de-facto copyright over the output of a model (unless the author specifically disclaims that in their license terms)?
There are arguments that the copyright of the input training data can apply to the outputs of the model. Certainly verbatim copying of a specific training input into the output will do that, however unlikely that is to occur, but there have been numerous examples of it actually happening. I can't pull up a good link now but I've definitely come across several sources of this in the past. Even more generically there is the argument that the output of a model are derivative of its inputs, thus copyright would transfer.
There are arguments claiming that the output of a model can be considered to be owned by the person prompting it. At least that was the argument for UK law. I think this is what most people assume is true, regardless of whether it actually is or not.
Then there's the argument that the output of AI is not eligible for copyright at all. E.g. “given current generally available technology, prompts alone do not provide sufficient human control to
make users of an AI system the authors of the output.” (congress.gov).
I'm a software developer. I don't currently use AI/LLMs, partly because I'm not okay with making my workflows even more dependent on a handful of billion/trillion dollar companies, but also because there is so much legal uncertainty. I'd sure like some clear guidance on this, because frankly everybody seems to be burying their head in the sand about this at the moment.
Techdirt has not posted any stories submitted by m11k.
Who owns LLM output?
I'd really like to have some legal clarification of the requirements to be able to claim any copyright over the output of an LLM. I've heard reasonable-sounding claims for several different possibilities:
- The output is not copyrightable at all and is effectively public domain
- The user prompting the LLM owns the output
- The company that trained and fine-tuned the model owns the output (which could then be granted to the user depending on the license)
There are valid arguments to be made for all of these (but my money is on the first one). Most GNU projects I've been following have not been allowing contributions of code generated by an LLM because of this legal uncertainty around the copyrightability of the code. I'm a little shocked at how the worlds largest software companies (and the smaller one I work at) have adopted prompt-based agentic development with little care about tracking which code was human generated vs LLM-prompted. If the shoe drops and it turns out that none of this LLM output is copyrightable, or only under specific conditions, its going to pretty hard to prove in court which parts you own and which you don't. As a dev, I have other concerns, primarily that I don't want to my ability to develop software to become dependent on a chokepoint controlled by a handful of billion/trillion dollar software companies. I try to keep my infrastructure and tooling 100% open source because I've seen too many rug pulls in the past.Who owns AI?
I think it was Jess that made some interesting comments about how much influence and control over a model's outputs that the individuals controlling the training and fine-tuning of the model have. She seemed to imply that it can be argued that the output of a model can be considered to be the speech of the model's creator (she did caveat that there were of course nuances). If the output of the model is their speech, would that then mean that they have de-facto copyright over the output of a model (unless the author specifically disclaims that in their license terms)? There are arguments that the copyright of the input training data can apply to the outputs of the model. Certainly verbatim copying of a specific training input into the output will do that, however unlikely that is to occur, but there have been numerous examples of it actually happening. I can't pull up a good link now but I've definitely come across several sources of this in the past. Even more generically there is the argument that the output of a model are derivative of its inputs, thus copyright would transfer. There are arguments claiming that the output of a model can be considered to be owned by the person prompting it. At least that was the argument for UK law. I think this is what most people assume is true, regardless of whether it actually is or not. Then there's the argument that the output of AI is not eligible for copyright at all. E.g. “given current generally available technology, prompts alone do not provide sufficient human control to make users of an AI system the authors of the output.” (congress.gov). I'm a software developer. I don't currently use AI/LLMs, partly because I'm not okay with making my workflows even more dependent on a handful of billion/trillion dollar companies, but also because there is so much legal uncertainty. I'd sure like some clear guidance on this, because frankly everybody seems to be burying their head in the sand about this at the moment.