Training Generative AI Models on Copyrighted Works Is Fair Use - Change My Mind

commie@lemmy.dbzer0.com · 10 months ago

Training Generative AI Models on Copyrighted Works Is Fair Use - Change My Mind

NevermindNoMind@lemmy.world · 10 months ago

Google scanned millions of books and made them available online. Courts ruled that was fair use because the purpose and interface didn’t lend itself to actually reading the books in Google books, but just searching them for information. If that is fair use, then I don’t see how training an LLM (which doesn’t retain the exact copy of the training data at least in the vast majority of cases) isn’t fair use. You aren’t going to get an argument from me.

I think most people who will disagree are reflexively anti AI, and that’s fine. But I just haven’t heard a good argument that AI training isn’t fair use.

commie@lemmy.dbzer0.com · 10 months ago

here’s a sidechannel attack on your position: every use, even infringing uses, are fair use until adjudicated, because what fair use means is that a court has agreed that your infringing use is allowed. so of course ai training (broadly) is always fair use. but particular instances of ai training may be found to not be fair use, and so we can’t be sure that you are always going to be right (for the specific ai models that may come into question legally).

Semperverus@lemmy.world · 10 months ago

“Its perfectly legal unless you get caught!”

Even_Adder@lemmy.dbzer0.com · 10 months ago

Here’s another good one: https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0

Cyber Yuki@lemmy.world · 10 months ago

What constitutes fair use?

17 U.S.C. § 107

Notwithstanding the provisions of sections 17 U.S.C. § 106 and 17 U.S.C. § 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.

GenAI training, at least regarding art, is neither criticism, comment, news reporting scholarship, nor research.

AI training is not done by scientists but engineers of a corporative entity with a long term profit goal.

So, by elimination, we can conclude that none of the purposes covered by the fair use doctrine apply to Generative AI training.

Q.E.D.

General_Effort@lemmy.world · 10 months ago

“Such as” means that these are examples and not an exhaustive list.

Can you explain how the 3 factors you listed rule out scholarship or research purpose? Regarding the first factor, how do you determine that AI developers are all engineers and never computer scientists?

commie@lemmy.dbzer0.com · 10 months ago

it is pretty obviously scholarship and research

makyo@lemmy.world · 10 months ago

Sure, that can be fair use, but only if using them can also be fair use

cyd@lemmy.world · 10 months ago

Agreed. I would also argue that trained model weights are not copyrightable.

kromem@lemmy.world · 10 months ago

They aren’t.

Courts have already ruled that copyright requires human creation, and weights are not decided by humans but by the training algorithms.

cyd@lemmy.world · 10 months ago

I didn’t know it was already settled law. But in that case, why are models like llama still released under licenses? If they are non-copyrightable, licenses should be unenforceable and therefore irrelevant.

kromem@lemmy.world · 10 months ago

The license is related to access.

Basically it’s gated and not publicly available, and the only way to open the gate is to say “I promise not to do anything outside what you are limiting me to do.”

A second person that gets access without agreeing to that can use the weights however they want (what copyright would relate to), but the person who gave them access to the weights would have been in breach of their agreement.

So separate things with different scopes.