Mateys! We have plundered the shores of tv shows and movies as these corporations flounder in stopping us seed and spread their files without regard for the flag of copyright. We have long plundered the shores of gaming and broke DRM that have been plaguing modern games, and allowing accessibility to games in countries where a game would cost a week or even a month of wages (I was once in this situation, so I am grateful for the pirating community for letting me enjoy the golden era of games back in 2012-2015).
But there, upon the horizon, lies a larger plunder. A kraken who guards a lair of untouched gold and emeralds, ready for the taking.
Closed-source AI models.
These corporations have stolen what was once ours, our own data, and put them in their AI models so that only they can profit off of it. These corporations raze the internet with their spiders and their bots to gather as much morsel of data from us which they can feed to their shiny new toy. We might not be able to stop them from stealing our data, but we have proven ourselves to be adept at copying things, leaking software, and this is what we need to do. AI is already too dangerous and to powerful for a select few corporations to control.
As long as AI is within the hands of corporations, not people, the AI will serve their goals, not ours. This needs to change, so this is what I propose for our next voyage.
Woah, this is awesome work. I’m amazed as usual with the open source community and with people willing to share their computation for this.
Closed-source AI models.
Books3 corpus would like you to know that all the data in it is from copyrighted books. It has reportedly been widely used in closed-source AI LLMs. “Rules for thee, not for me” shit. They’ll break copyright and then copyright what they made from it.
https://huggingface.co/datasets/the_pile_books3
Books3 is literally everything from the Bibliotik private tracker for books.
So yeah, fuckin roll out the cannons, mateys, let’s sink these hypocritical fuckers.
You’re allowed to train on copyrighted works, it isn’t illegal for anybody. This article by Kit Walsh does a good job of breaking it down. She’s a senior staff attorney at the EFF.
I didn’t say it was illegal, I said it was hypocritical.
Oh, my bad.
This has the same vibe as Github (owned by microsoft) training its AI Copilot on repositories under the GPL license, which specifically forbids any work based on it not be made proprietary. Literally a blatant disregard for the license, but it’s ok because it’s a mega-corporation doing it
You are going straight for the One Piece