Training data can be used "regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise."
Not at all… In fact, it’s totally batshit insane to determine that the biggest tech companies in the world can freely use anybody’s copyrighted data or intellectual property to train an AI and then claim to have ownership over the output.
The only way that it makes sense to have AI training be “fair use” is if the output of AI is not able to be copyrighted or commercially used, and that’s not the case here. This decision will only enable a mass, industrialized exploitation of workers, artists and creators.
Expanding on the already expansive terms of copyright is not the appropriate way to deal with the externalities of AI. This copyright maximalists approach will hurt small artists, remix culture, drive up business costs for artists who will be dragged into court to prove their workflows didn’t involve any generative steps, and as with every expansion of copyright, primarily help the large already centralized corporate IP holders to further cement their position.
Expanding the terms of copyright to 70 years after the life of the author actually didn’t help artists make art. Expanding copyright to cover “training” will result in more costly litigation, make things harder for small artists and creators, and further centralize the corporate IP hoarders that can afford to shoulder the increased costs of doing business. There are inumerable content creators that could and will make use of generative art to make content and they should be allowed to prosper. We need more fair use, not less.
That’s not true? There’s nothing stopping content creators from using their own content to create models. In fact, that’s my exact project for some of my visual art.
Moreover (edit: visual) models can’t effectively replicate the copywrite, so I don’t really see how it would infringe on it.
You do realize individuals can train neural networks on their own hardware, right? Generative art and generative text is not something owned by corporations — and in fact what is optimistically becoming apparent is that it is specifically difficult to build moats around a generative model, meaning that it’s especially hard for for corporations to own this technology outright — but those corporations are the only ones that benefit from expanding copyright.
Also, I disagree with you also. A trained model is a transformative work, as are the works you can generate with those models. Applying the four factor fair use test comes out heavily on the side of fair use.
@Gutless2615 Of course individuals can train models on their own work, but if they train it on other artists work, that too is an unauthorized use.
Honestly whether AI outputs can be copyrighted is really a separate issue from what I am concerned about… what matters in these cases is where/ how they obtained the inputs on which they trained the models. If a corporation or individual is using other artists works without authorization they are also committing theft, irrespective of any copyright infringement.
And while we’re at it let’s throw out mashup artists, collages, remixes and fair use altogether, huh? You’re just incorrect here, fair use exists for a reason, and applying the four factor fair use test to generative art comes out on the side of fair use nine times out of ten. What’s more, what you’re arguing for will only make it harder for small artists who get spurious accusations lobbed their way or automated take downs from bad “ai detector” software and have to drag out in progress files and lawyer money to argue they didn’t use generative tools in their workflow. There are better ways to make sure artists can still get paid - and, spoiler alert: it’s not just the artists that are going to get hit. We need to embrace more creative solutions to the problems of AI than “copyright harder”
The problem is that the AI can print the book word for word if you ask the right questions and at that point it’s breaking copyright again but that’s not a problem with the learning part but with how AI has no concept of understanding context at all
You can easily get photoshop to reproduce one of Mondrain’s paintings. That’s on the user not the tool, i fail to see why the same doesnt apply to the tool of generative AI.
I was skeptical of this, but it checks out: I easily got ChatGPT to print out the full text to The Tell-Tale Heart, without any errors at all in the various spots I accuracy-checked.
Granted I chose it because it’s a very short public domain work - I was more skeptical of its technical ability to recall the exact text without errors than of the ability to trick it into violating copyright law.
I still suspect it’s much easier to (accidentally) trick it into writing a fanfiction of a copyrighted work that it claims is the original than it is to get it to produce the true original, though.
Your argument that it is useful as a copyright infringing machine is that it can reproduce a public domain work? That’s… not the argument you think it is.
Just like a person with really good memory can. So what? Nobody is actually printing 300 page books that way when we can use libgen or any other source instead.
AI has no personal agency, lived experiences, or independent creative input.
Humans don’t have the ability to synthesize thousands of pages of text in a matter of minutes.
Any analogy toward human learning or behavior is shallow and flawed.
Yes exactly. When someone is creating art using stable diffusion it is clearly a manifestation of that artists intent. That is what copyright is designed to protect and should protect.
The part that you AI bots always forget is that the machine doesn’t do shit without a dataset. No data input, no output. And if you don’t own the inputs, what the hell makes you think you can claim ownership over the outputs?
If you ask an AI art program to paint you a “pretty kitty cat”, it can only do so because it has been fed enough pictures and paintings (plus metadata) to synthesize an acceptable output. Your human intent is an insignificant filter over their data, and if they haven’t trained on any pictures of cats, you will never achieve anything even close to your intent. Your prompt has the value of a Google search.
Finally, there is a key thing called the “artistic process” in which a human artist imagined vision of their finished work takes shape as they work. This is nothing like what happens under a neutral network, and it is why you are never going to be an artist simply by filling in a web form. You have no vision, and even if you did, the AI will never achieve it on your behalf.
Sorry, but if AI art sounds too good to be true, it’s because it is simply exploiting and distorting other people’s copyrighted artwork. It gives you the illusion of having created something, like the kid mashing buttons at the arcade machine without putting any money in. But the good news is that it’s not too late to learn how to draw.
I was wrong to use the dismissive term “AI bots”. I’m genuinely sorry about that and I let my feelings as an artist get the best of me, but other than that my point still stands. To be fair, “you’re wrong” and “shut up” aren’t exactly the strongest counter arguments either. No hard feelings.
The objective truth is that “AI” neural networks synthesize an output based on an input dataset. There is no creativity, personality artistry or other x-factor there, and until there is real “general artificial intelligence” there never will be. Human beings feed inputs into the machine, and they generate an output based on some subset of those inputs. If those inputs are “fair use” or otherwise licensed, then that’s perfectly fine. But if those inputs are unlicensed copyrighted works, then you would be insane to believe that you own the output that the algorithm produces–that’s like thinking you own the music that comes out of your speakers because you hit the play button. Just because you’re in control of the playback does not mean that you created the music, and nobody would seriously think that.
I’ve worked as an artist and a programmer, and a simple analogy is the concept of a software license. Just because you can see or download some source code on GitLab does not mean that you own it or can use it freely for any purpose; most code repositories are open sourced under some kind of license, which legitimate users of that code must comply with. We’ve already seen Microsoft make this mistake and then instantly backtrack with Github Copilot, because they understand that they simply do not have the IP rights to use GPL code (for one example) to train their AI. Similarly, if a musician samples a portion of a song to use in their own song, depending on various factors they may have to share credit with the original creator, and sometimes that make sense, in my opinion.
No matter how you or I feel about it, copyright law has always been there with the basic intent to protect people who create unique works. There are some circumstances which are currently considered “fair use” of unlicensed copyrighted works (for example, for educational purposes), and I think that’s great. But I think there is zero argument that unlimited automated content generation via AI ought to be considered genuine fair use. No matter how much AI fans want to try to personify the technology, it is not engaging in a creative or artistic process, it is merely synthesizing an output based on mixed inputs, just like how an AI chat bot is not truly thinking but merely stringing words together.
You do realize individuals can train neural networks on their own hardware, right?
Good luck training something that rivals big tech, especially now that they’re all putting “moats” around their data…
We, the little people, don’t have the data, the storage, the processing power, the RAM, and least but not least, the cash, to compete with them.
At any rate, if you train your NN using appropriately licensed or public domain data, more power to you. But if you feed a machine a bunch of other people’s writing, artwork, music, etc., please understand that you will never truly own the output.
You seem to be imagining a future in which AI is the great equalizer that ushers us in to some kind of utopia, but right now I’m only seeing even more money, power and control being clawed away from the people in favor of the biggest, richest tech conglomerates. It’s fucking dystopian, and I hope people like you will recognize that before it’s really too late.
At any rate, if you train your NN using appropriately licensed or public domain data, more power to you. But if you feed a machine a bunch of other people’s writing, artwork, music, etc., please understand that you will never truly own the output.
I am.
It is only the profit maximizing hyper capitalists who intend to use AI to exploit workers and rip off artists. I have no problem with the technology behind AI, I just don’t think people should be using it as a tool for continual, industrialized mass exploitation of the little people (like you and me) who actually own the data that they put online.
@brimnac it’s not a ‘someone’ though. The AI isn’t an actual consciousness. It’s a software company illegally using other artists work to develop their own commercial product. BIG DIFFERENCE.
The absolute right decision. Generative art is a fair use machine, not a plagiarism one. We need more fair use, not less.
Not at all… In fact, it’s totally batshit insane to determine that the biggest tech companies in the world can freely use anybody’s copyrighted data or intellectual property to train an AI and then claim to have ownership over the output.
The only way that it makes sense to have AI training be “fair use” is if the output of AI is not able to be copyrighted or commercially used, and that’s not the case here. This decision will only enable a mass, industrialized exploitation of workers, artists and creators.
Expanding on the already expansive terms of copyright is not the appropriate way to deal with the externalities of AI. This copyright maximalists approach will hurt small artists, remix culture, drive up business costs for artists who will be dragged into court to prove their workflows didn’t involve any generative steps, and as with every expansion of copyright, primarily help the large already centralized corporate IP holders to further cement their position.
But would copyright law not cover the creation of a piece of art that is derivative or a copy of another piece without proper credit?
A human artist does not violate copyright merely for studying a piece of art. Only by replicating it do they violate the law.
Why should these AI models not be covered in the same way?
It’s not the right decision for the content creators. So it’s not “absolute right”.
Expanding the terms of copyright to 70 years after the life of the author actually didn’t help artists make art. Expanding copyright to cover “training” will result in more costly litigation, make things harder for small artists and creators, and further centralize the corporate IP hoarders that can afford to shoulder the increased costs of doing business. There are inumerable content creators that could and will make use of generative art to make content and they should be allowed to prosper. We need more fair use, not less.
That’s not true? There’s nothing stopping content creators from using their own content to create models. In fact, that’s my exact project for some of my visual art.
Moreover (edit: visual) models can’t effectively replicate the copywrite, so I don’t really see how it would infringe on it.
@Gutless2615 corperations stealing artists work to develop their for-profit software is NOT fair use.
You do realize individuals can train neural networks on their own hardware, right? Generative art and generative text is not something owned by corporations — and in fact what is optimistically becoming apparent is that it is specifically difficult to build moats around a generative model, meaning that it’s especially hard for for corporations to own this technology outright — but those corporations are the only ones that benefit from expanding copyright. Also, I disagree with you also. A trained model is a transformative work, as are the works you can generate with those models. Applying the four factor fair use test comes out heavily on the side of fair use.
@Gutless2615 Of course individuals can train models on their own work, but if they train it on other artists work, that too is an unauthorized use.
Honestly whether AI outputs can be copyrighted is really a separate issue from what I am concerned about… what matters in these cases is where/ how they obtained the inputs on which they trained the models. If a corporation or individual is using other artists works without authorization they are also committing theft, irrespective of any copyright infringement.
And while we’re at it let’s throw out mashup artists, collages, remixes and fair use altogether, huh? You’re just incorrect here, fair use exists for a reason, and applying the four factor fair use test to generative art comes out on the side of fair use nine times out of ten. What’s more, what you’re arguing for will only make it harder for small artists who get spurious accusations lobbed their way or automated take downs from bad “ai detector” software and have to drag out in progress files and lawyer money to argue they didn’t use generative tools in their workflow. There are better ways to make sure artists can still get paid - and, spoiler alert: it’s not just the artists that are going to get hit. We need to embrace more creative solutions to the problems of AI than “copyright harder”
To me it’s essentially the same as someone reading a book or watching a movie when the AI learns from those examples.
Can it tell you what it learned, or does it copy billions of conversations online of what other people learned?
If it can’t interpret, it’s not learning.
All you get is the most basic form of data retention, if it retained millions of examples.
The problem is that the AI can print the book word for word if you ask the right questions and at that point it’s breaking copyright again but that’s not a problem with the learning part but with how AI has no concept of understanding context at all
You can easily get photoshop to reproduce one of Mondrain’s paintings. That’s on the user not the tool, i fail to see why the same doesnt apply to the tool of generative AI.
You can’t easily tell it to replicate any painting for you - with current AI you can do that with almost any book it trained with
A human being with really good/photographic memory can do the same.
I was skeptical of this, but it checks out: I easily got ChatGPT to print out the full text to The Tell-Tale Heart, without any errors at all in the various spots I accuracy-checked.
Granted I chose it because it’s a very short public domain work - I was more skeptical of its technical ability to recall the exact text without errors than of the ability to trick it into violating copyright law.
I still suspect it’s much easier to (accidentally) trick it into writing a fanfiction of a copyrighted work that it claims is the original than it is to get it to produce the true original, though.
Your argument that it is useful as a copyright infringing machine is that it can reproduce a public domain work? That’s… not the argument you think it is.
My message was pretty clear about which part of their claim I was skeptical about and what I was testing for. It’s not what you described here.
Just like a person with really good memory can. So what? Nobody is actually printing 300 page books that way when we can use libgen or any other source instead.
AI has no personal agency, lived experiences, or independent creative input.
Humans don’t have the ability to synthesize thousands of pages of text in a matter of minutes.
Any analogy toward human learning or behavior is shallow and flawed.
This is why humans are involved in the process. Your counterargument is shallow and flawed
Yes exactly. When someone is creating art using stable diffusion it is clearly a manifestation of that artists intent. That is what copyright is designed to protect and should protect.
Wrong. Copyright protects works, not ideas.
The part that you AI bots always forget is that the machine doesn’t do shit without a dataset. No data input, no output. And if you don’t own the inputs, what the hell makes you think you can claim ownership over the outputs?
If you ask an AI art program to paint you a “pretty kitty cat”, it can only do so because it has been fed enough pictures and paintings (plus metadata) to synthesize an acceptable output. Your human intent is an insignificant filter over their data, and if they haven’t trained on any pictures of cats, you will never achieve anything even close to your intent. Your prompt has the value of a Google search.
Finally, there is a key thing called the “artistic process” in which a human artist imagined vision of their finished work takes shape as they work. This is nothing like what happens under a neutral network, and it is why you are never going to be an artist simply by filling in a web form. You have no vision, and even if you did, the AI will never achieve it on your behalf.
Sorry, but if AI art sounds too good to be true, it’s because it is simply exploiting and distorting other people’s copyrighted artwork. It gives you the illusion of having created something, like the kid mashing buttons at the arcade machine without putting any money in. But the good news is that it’s not too late to learn how to draw.
You’re fundamentally wrong and presenting a bad-faith argument in an insulting manner. Please shut up
I was wrong to use the dismissive term “AI bots”. I’m genuinely sorry about that and I let my feelings as an artist get the best of me, but other than that my point still stands. To be fair, “you’re wrong” and “shut up” aren’t exactly the strongest counter arguments either. No hard feelings.
The objective truth is that “AI” neural networks synthesize an output based on an input dataset. There is no creativity, personality artistry or other x-factor there, and until there is real “general artificial intelligence” there never will be. Human beings feed inputs into the machine, and they generate an output based on some subset of those inputs. If those inputs are “fair use” or otherwise licensed, then that’s perfectly fine. But if those inputs are unlicensed copyrighted works, then you would be insane to believe that you own the output that the algorithm produces–that’s like thinking you own the music that comes out of your speakers because you hit the play button. Just because you’re in control of the playback does not mean that you created the music, and nobody would seriously think that.
I’ve worked as an artist and a programmer, and a simple analogy is the concept of a software license. Just because you can see or download some source code on GitLab does not mean that you own it or can use it freely for any purpose; most code repositories are open sourced under some kind of license, which legitimate users of that code must comply with. We’ve already seen Microsoft make this mistake and then instantly backtrack with Github Copilot, because they understand that they simply do not have the IP rights to use GPL code (for one example) to train their AI. Similarly, if a musician samples a portion of a song to use in their own song, depending on various factors they may have to share credit with the original creator, and sometimes that make sense, in my opinion.
No matter how you or I feel about it, copyright law has always been there with the basic intent to protect people who create unique works. There are some circumstances which are currently considered “fair use” of unlicensed copyrighted works (for example, for educational purposes), and I think that’s great. But I think there is zero argument that unlimited automated content generation via AI ought to be considered genuine fair use. No matter how much AI fans want to try to personify the technology, it is not engaging in a creative or artistic process, it is merely synthesizing an output based on mixed inputs, just like how an AI chat bot is not truly thinking but merely stringing words together.
Good luck training something that rivals big tech, especially now that they’re all putting “moats” around their data…
We, the little people, don’t have the data, the storage, the processing power, the RAM, and least but not least, the cash, to compete with them.
At any rate, if you train your NN using appropriately licensed or public domain data, more power to you. But if you feed a machine a bunch of other people’s writing, artwork, music, etc., please understand that you will never truly own the output.
You seem to be imagining a future in which AI is the great equalizer that ushers us in to some kind of utopia, but right now I’m only seeing even more money, power and control being clawed away from the people in favor of the biggest, richest tech conglomerates. It’s fucking dystopian, and I hope people like you will recognize that before it’s really too late.
Direct your ire where it belongs - at capitalism not technology
I am.
It is only the profit maximizing hyper capitalists who intend to use AI to exploit workers and rip off artists. I have no problem with the technology behind AI, I just don’t think people should be using it as a tool for continual, industrialized mass exploitation of the little people (like you and me) who actually own the data that they put online.
@brimnac it’s not a ‘someone’ though. The AI isn’t an actual consciousness. It’s a software company illegally using other artists work to develop their own commercial product. BIG DIFFERENCE.
You learned from those same things and make a profit.
Does AI also have eyeballs and brain tissues? Do they have conscience, sentience, shame?
Do you?
What’s that got to do with the price of fish?