I fucked with the title a bit. What i linked to was actually a mastodon post linking to an actual thing. but in my defense, i found it because cory doctorow boosted it, so, in a way, i am providing the original source here.
please argue. please do not remove.
I think we should have a rule that says if a LLM company invokes fair use on the training inputs then the outputs are public domain.
That’s already been ruled on once.
Why would companies care about copyright of the output? The value is in the tool to create it. The whole issue to me revolves around the AI company profiting on it’s service. A service built on a massive library of copyrighted works. It seems clear to me, a large portion of their revenue should go equally to the owners of the works in their database.
You can still copyright AI works, you just can’t name an AI as the author.
That’s just saying you can claim copyright if you lie about authorship. The problem then is, you may step into the realm of fraud.
You don’t have to lie about authorship. You should read the guidance.
Well, what you initially said sounded like fraud, but the incredibly long page indeed doesn’t talk about fraud. However, it also seems a bit vague. What counts as your contributions to the work? Is it part of the input the model was trained on, “I wrote the prompt”, or making additionally changes based on the result?
The vagueness surrounding contributions is particularly troubling. Without clearer guidelines, this seems like a recipe for lawsuits.
The outputs are not copyrightable.
But something not being copyrightable doesn’t necessarily mean openly distributed.
It does mean OpenAI can’t really restrict or go after other companies training off of GPT-4 outputs though, which is occurring broadly.
Not just the outputs but the models as well