We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.

L4sBot@lemmy.world · 10 months ago

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.

dragontamer@lemmy.world · edit-2 10 months ago

Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

Likely because the “AI” was trained upon this image at some point. This has repercussions with regards to copyright law. It means the training set contains copyrighted data and the use of said training set could be argued as piracy.

Legal discussions on how to talk about generative-AI are only happening now, now that people can experiment with the technology. But its not like our laws have changed, copyright infringement is copyright infringement. If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

abhibeckert@lemmy.world · edit-2 10 months ago

But where is the infringement?

This NYT article includes the same several copyrighted images and they surely haven’t paid any license. It’s obviously fair use in both cases and NYT’s claim that “it might not be fair use” is just ridiculous.

Worse, the NYT also includes exact copies of the images, while the AI ones are just very close to the original. That’s like the difference between uploading a video of yourself playing a Taylor Swift cover and actually uploading one of Taylor Swift’s own music videos to YouTube.

Even worse the NYT intentionally distributed the copyrighted images, while Midjourney did so unintentionally and specifically states it’s a breach of their terms of service. Your account might be banned if you’re caught using these prompts.

jacksilver@lemmy.world · 10 months ago

You do realize that newspapers do typically pay the licensing for images, it’s how things like Getty images exist.

On the flip side, OpenAI (and other companies) are charging someone access to their model, which is then returning copyrighted images without paying the original creator.

That’s why situations like this keep getting talked about, you have a 3rd party charging people for copyrighted materials. We can argue that it’s a tool, so you aren’t really “selling” copyrighted data, but that’s the issue that is generally be discussed in these kinds of articles/court cases.

ApollosArrow@lemmy.world · 10 months ago

Mostly playing devil’s advocate here (since I don’t think ai should be used commercially), but I’m actually curious about this, since I work in media… You can get away using images or footage for free if it falls under editorial or educational purposes. I know this can vary from place to place, but with a lot of online news sites now charging people to view their content, they could potentially be seen as making money off of copyrighted material, couldn’t they?

jacksilver@lemmy.world · 10 months ago

It’s not a topic that I’m super well versed in, but here is a thread from a photography forum indicating that news organizations can’t take advantage of fair use https://www.dpreview.com/forums/thread/4183940.

I think these kinds of stringent rules are why so many are up in arms about how AI is being used. It’s effectively a way for big players to circumvent paying the people who out all the work into the art/music/voice acting/etc. The models would be nothing without the copyrighted material, yet no one seems to want to pay those people.

It gets more interesting when you realize that long term we still need people creating lots of content if we want these models to be able to create things around concepts that don’t yet exist (new characters, genres of music, etc.)

dragontamer@lemmy.world · 10 months ago

But where is the infringement?

Do Training weights have the data? Are the servers copying said data on a mass scale, in a way that the original copyrighters don’t want or can’t control?

orclev@lemmy.world · 10 months ago

Data is not copyrighted, only the image is. Furthermore you can not copyright a number, even though you could use a sufficiently large number to completely represent a specific image. There’s also the fact that copyright does not protect possession of works, only distribution of them. If I obtained a copyrighted work no matter the means chosen to do so, I’ve committed no crime so long as I don’t duplicate that work. This gets into a legal grey area around computers and the fundamental way they work, but it was already kind of fuzzy if you really think about it anyway. Does viewing a copyrighted image violate copyright? The visual data of that image has been copied into your brain. You have the memory of that image. If you have the talent you could even reproduce that copyrighted work so clearly a copy of it exists in your brain.

dragontamer@lemmy.world · edit-2 10 months ago

only distribution of them.

Yeah. And the hard drives and networks that pass Midjourney’s network weights around?

That’s distribution. Did Midjourney obtain a license from the artists to allow large numbers of “Joker” copyrighted data to be copied on a ton of servers in their data-center so that Midjourney can run? They’re clearly letting the public use this data.

orclev@lemmy.world · 10 months ago

Because they’re not copying around images of Joker, they’re copying around a work derived from many many things including images of Joker. Copying a derived work does not violate the copyright of the work it was derived from. The wrinkle in this case is that you can extract something very similar to the original works back out of the derived work after the fact. It would be like if you could bake a cake, pass it around, and then down the line pull a whole egg back out of it. Maybe not the exact egg you started with, but one very similar to it. This is a situation completely unlike anything that’s come before it which is why it’s not actually covered by copyright. New laws will need to be drafted (or at a bare minimum legal judgements made) to decide how exactly this situation should be handled.

dragontamer@lemmy.world · 10 months ago

derived

https://www.law.cornell.edu/wex/derivative_work

Copyrights allow their owners to decide how their works can be used, including creating new derivative works off of the original product. Derivative works can be created with the permission of the copyright owner or from works in the public domain. In order to receive copyright protection, a derivative work must add a sufficient amount of change to the original work.

Are you just making shit up?

abhibeckert@lemmy.world · 10 months ago

Do Training weights have the data?

The answer to that question is extensively documented by thousands of research papers - it’s not up for debate.

orclev@lemmy.world · 10 months ago

Wasn’t that known? Have midjourney ever claimed they didn’t use copyrighted works? There’s also an ongoing argument about the legality of that in general. One recent court case ruled that copyright does not protect a work from being used to train an AI. I’m sure that’s far from the final word on the topic, but it does mean this is a legal grey area at the moment.

dragontamer@lemmy.world · edit-2 10 months ago

If it is known, then it is copyright infringement to download the training sets and therefore a crime to do so. You cannot reproduce a copy of the works without the express permission of the copyright holder.

How many computers did Midjourney copy its training weights to? Has Midjourney (and the IT team behind it) paid royalties for every copyrighted image in its training set to have a proper copyright license to copy all of this data from computer to computer?

I’m guessing no. Which means the Midjourney team (if you say is true) is committing copyright infringement every time they spin up a new server with these weights.

Pro-AI side will obviously argue that the training weights do not contain the data of these copyrighted works. A claim that is looking more-and-more laughable as these experiments happen.

db0@lemmy.dbzer0.com · 10 months ago

No it’s not illegal to download publicly available content it’s a copyright violation to republish it.

Jilanico@lemmy.world · 10 months ago

Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

Is it tho? Honest question.

dragontamer@lemmy.world · 10 months ago

How did the Joker image get replicated?

Jilanico@lemmy.world · 10 months ago

It’s too hard to type up how generative AIs work, but look up a video on “how stable diffusion works” or something like that. I seriously doubt they have a massive database with every image from the Internet inside it, with the AI just spitting those pics out, but I’m no expert.

QubaXR@lemmy.world · edit-2 10 months ago

Yes it is. Honest answer.

Jilanico@lemmy.world · 10 months ago

So stable diffusion, midjourney, etc., all have massive databases with every picture on the Internet stored in them? I know the AI models are trained on lots of images, but are the images actually stored? I’m skeptical, but I’m no expert.

QubaXR@lemmy.world · 10 months ago

These models were trained on datasets that, without compensating the authors, used their work as training material. It’s not every picture on the net, but a lot of it is scrubbing websites, portfolios and social networks wholesale.

A similar situation happens with large language models. Recently Meta admitted to using illegally pirated books (Books3 database to be precise) to train their LLM without any plans to compensate the authors, or even as much as paying for a single copy of each book used.

Jilanico@lemmy.world · 10 months ago

Most of the stuff that inspires me probably wasn’t paid for. I just randomly saw it online or on the street, much like an AI.

AI using straight up pirated content does give me pause tho.

QubaXR@lemmy.world · edit-2 10 months ago

I was on the same page as you for the longest time. I cringed at the whole “No AI” movement and artists’ protest. I used the very same idea: Generations of artists honed their skills by observing the masters, copying their techniques and only then developing their own unique style. Why should AI be any different? Surely AI will not just copy works wholesale and instead learn color, composition, texture and other aspects of various works to find it’s own identity.

It was only when my very own prompts started producing results I started recognizing as “homages” at best and “rip-offs” at worst that gave me a stop.

I suspect that earlier generations of text to image models had better moderation of training data. As the arms race heated up and pace of development picked up, companies running these services started rapidly incorporating whatever training data they could get their hands on, ethics, copyright or artists’ rights be damned.

I remember when MidJourney introduced Niji (their anime model) and I could often identify the mangas and characters used to train it. The imagery Niji produced kept certain distinct and unique elements of character designs from that training data - as a result a lot of characters exhibited “Chainsaw Man” pointy teeth and sticking out tongue - without as much as a mention of the source material or even the themes.

topinambour_rex@lemmy.world · 10 months ago

How much profit do you make from this stuff ?

Jilanico@lemmy.world · 10 months ago

The stuff I sell on jilanico.com? Enough to make it worth my while.

orclev@lemmy.world · 10 months ago

If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

This is the crux of the issue, it isn’t obviously copyright infringement. Currently copyright is completely silent on the matter one way or another.

The thing that makes this particularly interesting is that the traditional copyright maximalists, the ones responsible for ballooning copyright durations from its original reasonable limit of 14 years (plus one renewal) to its current absurd duration of 95 years, also stand to benefit greatly from generative works. Instead of the usual full court press we tend to see from the major corporations around anything copyright related we’re instead seeing them take a rather hands off approach.

dragontamer@lemmy.world · 10 months ago

This is the crux of the issue, it isn’t obviously copyright infringement. Currently copyright is completely silent on the matter one way or another.

Its clear that the training weights have the data on recreating this Joker scene. Its also clear that if the training-data didn’t contain this image, then the copy of the image would never result into the weights that have been copy/pasted everywhere.

orclev@lemmy.world · edit-2 10 months ago

Except it isn’t a perfect copy. It’s very similar, but not exact. Additionally for every example you can find where it spits out a nearly identical image you can also find one where it produces nothing like it. Even more complicated you can get images generated that very closely match other copyrighted works, but which the model was never trained on. Does that mean copying the model violates the copyright of a work that it literally couldn’t have included in its data?

You’re making a lot of assumptions and arguments that copyright covers things that it very much does not cover or at a minimum that it hasn’t (yet) been ruled to cover.

Legally, as things currently stand, an AI model trained on a copyrighted work is not a copy of that work as far as copyright is concerned. That’s today’s legal reality. That might change in the future, but that’s far from certain, and is a far more nuanced and complicated problem than you’re making it out to be.

Any legal decision that ruled an AI model is a copy of all the works used to train it would also likely have very far reaching and complicated ramifications. That’s why this needs to be argued out in court, but until then what midjourney is doing is perfectly legal.

dragontamer@lemmy.world · 10 months ago

https://www.law.cornell.edu/wex/derivative_work

Copyrights allow their owners to decide how their works can be used, including creating new derivative works off of the original product. Derivative works can be created with the permission of the copyright owner or from works in the public domain. In order to receive copyright protection, a derivative work must add a sufficient amount of change to the original work.

The law is very clear on the nature of derivative works of copyrighted material.

orclev@lemmy.world · 10 months ago

Not sure where they’re getting the bit about copyright disallowing derived works as that’s just not true. You can get permission to create a derived work, but you don’t need permission to create a derived work so long as the final result does not substantially consist of the original work.

Unfortunately what constitutes “substantially” is somewhat vague. Various rulings have been made around that point, but I believe a common figure used is 30%. By that metric any given image represents substantially less than 30% of any AI model so the model itself is a perfectly legal derived work with its own copyright separate from the various works that were combined to create it.

Ultimately though the issue here is that the wrong tool is being used, copyright just doesn’t cover this case, it’s just what people are most familiar with (not to mention most people are very poorly educated about it) so that’s what everyone reaches for by default.

With generative AI what we have is a tool that can be used to trivially produce works that are substantially similar to existing copyrighted works. In this regard it’s less like a photocopier, and more like Photoshop, but with the critical difference that no particular talent is necessary to create the reproduction. Because it’s so easy to use people keep focusing on trying to kill the tool rather than trying to police the people using it. But they’re going about it all wrong, copyright isn’t the right weapon if that’s your goal. Copyright can be used to go after the people using generative AI tools, but not the people creating the tools.

dragontamer@lemmy.world · 10 months ago

Because it’s so easy to use people keep focusing on trying to kill the tool rather than trying to police the people using it. But they’re going about it all wrong, copyright isn’t the right weapon if that’s your goal. Copyright can be used to go after the people using generative AI tools, but not the people creating the tools.

Why? If the training weights are created and distributed in violation of copyright laws, it seems appropriate to punish those illegal training weights.

In fact, all that people really are asking for, is for a new set of training weights to be developed but with appropriate copyright controls. IE: With express permission from the artists and/or entities who made the work.

orclev@lemmy.world · 10 months ago

Why? If the training weights are created and distributed in violation of copyright laws, it seems appropriate to punish those illegal training weights.

Because they aren’t illegal and they don’t violate copyright. People keep wanting them to be against copyright, but that’s just not how copyright works. There either needs to be amendments to copyright law in order to cover this case, but those changes would need to be very carefully tailored. It would be way too easy to make something that’s either overly broad and applies to a bunch of situations it wasn’t intended to, or way too narrow allowing for easy circumventing.

In fact, all that people really are asking for, is for a new set of training weights to be developed but with appropriate copyright controls. IE: With express permission from the artists and/or entities who made the work.

While that might appease some people, it wouldn’t appease everyone. There are a lot of workers in the creative fields that are feeling incredibly threatened by generative AI right now. Some of these fears are certainly overblown, but it’s also true corporations are going to be as shitty as possible and so some regulation is probably in order. That said, once again, copyright just doesn’t seem to be the right tool for the job here.

dragontamer@lemmy.world · edit-2 10 months ago

Because they aren’t illegal and they don’t violate copyright

Because they are legal and they do violate copyright? People keep wanting them to be copyright free, but that’s not how copyright works. There don’t need to be amendments to copyright law in order to cover this case.

I mean, its obviously heading to the courts one way or the other, but I don’t think just making assertions like that are very good kind of arguing. The training weights here have clearly been proven to contain copyrighted data as per this article. I’m not sure if you’re making any kind of serious case that shows otherwise, but are instead just making a bunch of assertions that I could easily reverse.

rsuri@lemmy.world · 10 months ago

But its not like our laws have changed

And that’s the problem. The internet has drastically reduced the cost of copying information, to the point where entirely new uses like this one are now possible. But those new uses are stifled by copyright law that originates from a time when the only cost was that people with gutenberg presses would be prohibited from printing slightly cheaper books. And there’s no discussion of changing it because the people who benefit from those laws literally are the media.

dragontamer@lemmy.world · 10 months ago

Copyright was literally invented because its cheap and easy to copy information (ie: Printing Press).

When copies are easy, you screw over the original artist. A large scale regulation of copies must be enforced by the central authorities to make sure small artists get the payments that they deserve. It doesn’t matter if you use a printing press, a xerox machine, a photograph, a phonograph, a record, a CD-ROM copy, a tape recorder, or the newest and fanciest AI to copy someone’s work. Its a copy, and therefore under the copyright regulations.

LainTrain@lemmy.dbzer0.com · 10 months ago

By that logic I am also storing that image in my dataset, because I know and remember this exact image. I can reproduce it from memory too.

dragontamer@lemmy.world · 10 months ago

You ever try to do a public performance of a copyrighted work, like “Happy Birthday to You” ??

You get sued. Even if its from memory. Welcome to copyright law. There’s a reason why every restaraunt had to make up a new “Happy Happy Birthday, from the Birthday Crew” song.

LainTrain@lemmy.dbzer0.com · 10 months ago

Yeah, but until I perform it without a license for profit, I don’t get sued.

So it’s up to the user to make sure that if any material that is generated is copyright infringing, it should not be used.

dragontamer@lemmy.world · 10 months ago

Otakon anime music videos have no profits but they explicitly get a license from RIAA to play songs in public.

LainTrain@lemmy.dbzer0.com · 10 months ago

So? I’m not saying those are fair terms, I would also prefer if that were not the case, but AI isn’t performing in public any more having a guitar with you in public is ripping off Metallica.

dragontamer@lemmy.world · edit-2 10 months ago

You don’t need to perform “for profit” to get sued for copyright infringement.

but AI isn’t performing in public any more having a guitar with you in public is ripping off Metallica.

Is the Joker image in that article derivative or substantially similar to a copyrighted work? Is the query available to anyone who uses Midjourney? Are the training weights being copied from server-to-server behind the scenes? Were the training weights derived from copyrighted data?

LainTrain@lemmy.dbzer0.com · 10 months ago

Yes and none of that matters in the slightest. By that logic the Library of Babel is also copyright infringement. By that logic my memory of the movie is copyright infringing even if I don’t do anything with it.

dragontamer@lemmy.world · 10 months ago

You’re taking a fictional work and trying to apply real world laws to it?

Copyright assumes that Library of Babel would take up so much space as it’d be impossible to create.

Which is true. Every possible combination of letters, spaces, and characters would never fit on anything in today’s universe (be it a 24 TB Hard Drive, or even a collection of thousands of them).

Secondly: any computer-generated work is automatically non-copyrighted as per US Law.