GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined
So they’re admitting that their entire business model requires them to break the law. Sounds like they shouldn’t exist.
Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.
humans studying it, is fair use.
So if a tool is involved, it’s no longer ok? So, people with glasses cannot consume copyrighted material?
No. A tool already makes it unnatural. /S
What’s the difference? Humans are just the intent suppliers, the rest of the art is mostly made possible by software, whether photoshop or stable diffusion.
I don’t agree. The publisher of the material does not get to dictate what it is used for. What are we protecting at the end of the day and why?
In the case of a textbook, someone worked hard to explain certain materials in a certain way to make the material easily digestible. They produced examples to explain concepts. Reproducing and disseminating that material would be unfair to the author who worked hard to produce it.
But the author does not have jurisdiction over the knowledge gained. They cannot tell the reader that they are forbidden from using the knowledge gained to tutor another person in calculus. That would be absurd.
IP law protects the works of the creator. The author of a calculus textbook did not invent calculus. As such, copyright law does not apply.
It likely doesn’t break the law. You should check out this article by Kit Walsh, a senior staff attorney at the EFF, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.
Headlines like these let people assume that it’s illegal, rather than educate people on their rights.
The Kit Walsh article purposefully handwaves around a couple of issues that could present larger issues as law suits in this arena continue.
-
He says that due to the size of training data and the model, only a byte of data per image could be stored in any compressed format, but this assumes all training data is treated equally. It’s very possible certain image artifacts are compressed/stored in the weights more than other images.
-
These models don’t produce exact copies. Beyond the Getty issue, nytimes recently released an article about a near duplicate - https://www.nytimes.com/interactive/2024/01/25/business/ai-image-generators-openai-microsoft-midjourney-copyright.html.
I think some of the points he makes are valid, but they’re making a lot of assumptions about what is actually going on in these models which we either don’t know for certain or have evidence to the contrary.
I didn’t read Katherine’s article so maybe there is something more there.
She addresses both of those, actually. The Midjourney thing isn’t new, It’s the sign of a poorly trained model.
I’m not sure she does, just read the article and it focuses primarily what models can train on. However, the real meat of the issue, at least I think, with GenAI is what it produces.
For example, if I built a model that just spit out exact frames from “Space Jam”, I don’t think anyone would argue that would be a problem. The question is where is the line?
This part does:
It’s not surprising that the complaints don’t include examples of substantially similar images. Research regarding privacy concerns suggests it is unlikely it is that a diffusion-based model will produce outputs that closely resemble one of the inputs.
According to this research, there is a small chance that a diffusion model will store information that makes it possible to recreate something close to an image in its training data, provided that the image in question is duplicated many times during training. But the chances of an image in the training data set being duplicated in output, even from a prompt specifically designed to do just that, is literally less than one in a million.
The linked paper goes into more detail.
On the note of output, I think you’re responsible for infringing works, whether you used Photoshop, copy & paste, or a generative model. Also, specific instances will need to be evaluated individually, and there might be models that don’t qualify. Midjourney’s new model is so poorly trained that it’s downright easy to get these bad outputs.
This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.
Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.
The article dealt with Stable Diffusion, the only open model that allowed people to study it. If there were more problems with Stable Diffusion, we’d’ve heard of them by now. These are the critical solutions Open-source development offers here. By making AI accessible, we maximize public participation and understanding, foster responsible development, as well as prevent harmful control attempts.
As it stands, she was much better informed than you are and is an expert in law to boot. On the other hand, you’re making a sweeping generalization right into an appeal to ignorance. It’s dangerous to assert a proposition just because it has not been proven false.
-
You might want to read this post from one of the EFF’s senior lawyers on the topic who has previously litigated IP cases:
https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0
It doesn’t break the law at all. The courts have already ruled that copyrighted material can be fed into AI/ML models for training:
This ruling only applies to the 2nd Circuit and SCOTUS has yet to take up a case. As soon as there’s a good fact pattern for the Supreme Court of a circuit split, you’ll get nationwide information. You’ll also note that the decision is deliberately written to provide an extremely narrow precedent and is likely restricted to Google Books and near-identical sources of information.
i don’t think it’s need rules against the law…
you know what? I like this argument. Software/Streaming services are “too complex and costly to work in practice” therefore my viewership/participation “could not exist” if I were forced to pay for them.
Not that I am a fan of the current implementation of copyright in the US, but I know if I was planning on building my business around something that couldn’t exist without violating copyright I would surely thought of that fairly early on.
“My profits from fencing your wallet could not exist if stealing your wallet were punished.”
“Ah, you’re right, how silly of me, carry on.”
Another reason why copyright should be shortened… Society has changed massively in the last 100 years, but every expression of our modern society is locked behind copyright.
So… This may be an unpopular question. Almost every time AI is discussed, a staggering number of posts support very right-wing positions. EG on topics like this one: Unearned money for capital owners. It’s all Ayn Rand and not Karl Marx. Posters seem to be unaware of that, though.
Is that the “neoliberal Zeitgeist” or what you may call it?
I’m worried about what this may mean for the future.
ETA: 7 downvotes after 1 hour with 0 explanation. About what I expected.
It’s interesting as it’s many of the MPAA/RIAA attitudes towards Napster/BitTorrent but now towards gen AI.
I think it reflects the generational shift in who considers themselves content creators. Tech allowed for the long tail to become profitable content producers, so now there’s a large public audience that sees this from what’s historically been a corporate perspective.
Of course, they are making the same mistakes because they don’t know their own history and thus are doomed to repeat it.
They are largely unaware that the MPAA/RIAA fighting against online sharing of media meant they ceded the inevitable tech to other companies like Apple and Netflix that developed platforms that navigated the legality alongside the tech.
So for example right now voice actors are largely opposing gen AI rather than realizing they should probably have their union develop or partner for their own owned offering which maximizes member revenues off of usage and can dictate fair terms.
In fact, the only way many of today’s mass content creators have platforms to create content is because the corporate fights to hold onto IP status quo failed with platforms like YouTube, etc.
Gen AI should exist in a social construct such that it is limited in being able to produce copyrighted content. But policing training/education of anything (human or otherwise) doesn’t serve us and will hold back developments that are going to have much more public good than most people seem to realize.
Also, it’s unfortunate that we’ve effectively self propagandized for nearly a century around ‘AI’ being the bad guy and at odds with humanity, misaligned with our interests, an existential threat, etc. There’s such an incredible priming bias right now that it’s effectively become the Boogeyman rather than correctly being identified as a tool that - like every other tool in human history - is going to be able to be used for good or bad depending on the wielder (though unlike past tools this one may actually have a slight inherent and unavoidable bias towards good as Musk and Gab recently found out with their AI efforts on release denouncing their own personally held beliefs).
That was a novel perspective for me. Thanks.
I’d say the main reason is companies are profiting off the work of others. It’s not some grand positive motive for society, but taking the work of others, from other companies, sure, but also from small time artists, writers, etc.
Then selling access to the information they took from others.
I wouldn’t call it a right wing position.
Wanting to abolish the IRS is a right-wing policy that will benefit the rich. That doesn’t change when some marketing genius talks about how the IRS takes money from small time artists, writers, etc. Same thing. It’s about substance and not manipulative framing.
That isn’t remotely similar…
The IRS takes a portion of income. This is taking away someone’s income, then charging access to it.
Like it or not, these people need money to survive. Calling it right wing to think these individuals deserve to be paid for someone taking their work, then using it for a product they sell access to, is absolutely insane to me.
I don’t know how this is supposed to make sense.
One is a percentage of income that everyone pays into.
The other is stealing someone’s work then using that person’s work for profit.
Recognizing that stealing someone’s work is not a right-wing position.
How is this complicated?
I see. Thanks for explaining.
This view of property rights as absolute is what right-libertarians, anarcho-capitalists, etc… espouse. Usually the cries of “theft” come when it gets to taxes, though. Is it supposed to be not right because it’s about intellectual property?
Property rights are not necessarily right-wing (communism notwithstanding). What is definitely right-wing is (heritable) privilege and that’s implied in these views of property.
ETA: Just to make sure that I really understand what you are saying: When you say “stealing someone’s work” you do mean the unauthorized copying of copyrighted expression, yes? Do you actually understand that copyright is intellectual property and that property is not usually called work? Labor and capital are traditionally considered opposites, of a sort, particularly among the left.
So… You think their art or writing was created by what then? Magic? Do you think no time was expended in the creation of books, research, drawings, painted canvases, etc?
Do you think they should starve because we currently live in a world driven entirely around money?
I don’t get your point even remotely.
Every single poster here has relied on disruptive technologies in their life. They don’t even realize that they couldn’t even make these arguments here if it was not for people before them pushing the envelope.
They don’t know the history of their technology nor corporate law. If they did they would just roll their eyes every time an entrenched economic interest started saber rattling about the next disruptive technology that is going to steal their profits.
The posters here are the people who complained about horsewhip manufactures that were going out of business because of cars. They are ignorant and act like the few sound bytes they heard make them an expert.
As an aside, when I browse TheGatewayPundit comments on AI articles, it is a lot more open, against legislation, and woke than I would expect!
I don’t know what you’re on about, the majority of the thread is pro open source AI and anti-capitalist, which is as left a stance as it gets, it’s not called “copyleft” for no reason. No one here wants to see AI banned and the already insane IP laws expanded to the benefit of the few corpos like the NYT at the expense of broader society.
IDK. I have seen a number of pro-corpo copyleft takes. It’s absolutely crazy to me. The pitch is that expansive copyright makes for expansive copyleft. It seems neo-feudal to me. The lords have their castles but the peasants have their commons.
Fair enough, seems like they’re down voting us anyway
I’d be fine with this argument if these generative tools were only being used by non-profits. But they aren’t.
So I think there has to be some compromise here. Some type of licensing fee should be paid by these generative AI tools.
I’m just trying to think about how refined AI would be if it could only use public domain data.
ChatGPT channels Jane Austin and Shakespeare.
That’s not really how it would work.
If you want that outcome, it’s better to train on as massive a data set as possible initially (which does regress towards the mean but also manages to pick up remarkable capabilities and relationships around abstract concepts), and then use fine tuning to bias it back towards an exceptional result.
If you only trained it on those works, it would suck at pretty much everything except specifically completing those specific works with those specific characters. It wouldn’t model what the concerns of a prince in general were, but instead model that a prince either wants to murder his mother (Macbeth) or fuck her (Oedipus).
Fine. Shut them down.
Huh. You’d think in a situation where copyright is threatened by a lack of AI regulation, Disney would be all over this. Oh wait. They’re trying to use generative AI to make movies cheaper. Nevermind.
Sounds like a win to me
and how exactly will the untold millions and millions of rights holders be identified?
Slaves “could not exist” if firms are made to pay wages.
writing software that does things for us is the only purpose of computers. LLMs are far from “true” AI but still they are useful for a bunch of tasks.
ban their use in creative works, of course nobody wants to read a book written by an AI. but let me have a LLM to use as a tool.
I never said anything about ban it, I was just making a joke about that title.
Ooooh no! We might, like, get to keep our jobs and homes.