GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined

  • Valen@lemmy.world
    link
    fedilink
    English
    arrow-up
    88
    arrow-down
    21
    ·
    10 months ago

    So they’re admitting that their entire business model requires them to break the law. Sounds like they shouldn’t exist.

    • Marcbmann@lemmy.world
      link
      fedilink
      English
      arrow-up
      33
      arrow-down
      7
      ·
      10 months ago

      Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.

        • hglman@lemmy.ml
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          4
          ·
          10 months ago

          So if a tool is involved, it’s no longer ok? So, people with glasses cannot consume copyrighted material?

        • LainTrain@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          6
          ·
          10 months ago

          What’s the difference? Humans are just the intent suppliers, the rest of the art is mostly made possible by software, whether photoshop or stable diffusion.

        • Marcbmann@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          2
          ·
          10 months ago

          I don’t agree. The publisher of the material does not get to dictate what it is used for. What are we protecting at the end of the day and why?

          In the case of a textbook, someone worked hard to explain certain materials in a certain way to make the material easily digestible. They produced examples to explain concepts. Reproducing and disseminating that material would be unfair to the author who worked hard to produce it.

          But the author does not have jurisdiction over the knowledge gained. They cannot tell the reader that they are forbidden from using the knowledge gained to tutor another person in calculus. That would be absurd.

          IP law protects the works of the creator. The author of a calculus textbook did not invent calculus. As such, copyright law does not apply.

    • Even_Adder@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      34
      arrow-down
      9
      ·
      edit-2
      10 months ago

      It likely doesn’t break the law. You should check out this article by Kit Walsh, a senior staff attorney at the EFF, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.

      Headlines like these let people assume that it’s illegal, rather than educate people on their rights.

      • jacksilver@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3
        ·
        10 months ago

        The Kit Walsh article purposefully handwaves around a couple of issues that could present larger issues as law suits in this arena continue.

        1. He says that due to the size of training data and the model, only a byte of data per image could be stored in any compressed format, but this assumes all training data is treated equally. It’s very possible certain image artifacts are compressed/stored in the weights more than other images.

        2. These models don’t produce exact copies. Beyond the Getty issue, nytimes recently released an article about a near duplicate - https://www.nytimes.com/interactive/2024/01/25/business/ai-image-generators-openai-microsoft-midjourney-copyright.html.

        I think some of the points he makes are valid, but they’re making a lot of assumptions about what is actually going on in these models which we either don’t know for certain or have evidence to the contrary.

        I didn’t read Katherine’s article so maybe there is something more there.

        • Even_Adder@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          3
          ·
          edit-2
          10 months ago

          She addresses both of those, actually. The Midjourney thing isn’t new, It’s the sign of a poorly trained model.

          • jacksilver@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            10 months ago

            I’m not sure she does, just read the article and it focuses primarily what models can train on. However, the real meat of the issue, at least I think, with GenAI is what it produces.

            For example, if I built a model that just spit out exact frames from “Space Jam”, I don’t think anyone would argue that would be a problem. The question is where is the line?

            • Even_Adder@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              1
              ·
              edit-2
              10 months ago

              This part does:

              It’s not surprising that the complaints don’t include examples of substantially similar images. Research regarding privacy concerns suggests it is unlikely it is that a diffusion-based model will produce outputs that closely resemble one of the inputs.

              According to this research, there is a small chance that a diffusion model will store information that makes it possible to recreate something close to an image in its training data, provided that the image in question is duplicated many times during training. But the chances of an image in the training data set being duplicated in output, even from a prompt specifically designed to do just that, is literally less than one in a million.

              The linked paper goes into more detail.

              On the note of output, I think you’re responsible for infringing works, whether you used Photoshop, copy & paste, or a generative model. Also, specific instances will need to be evaluated individually, and there might be models that don’t qualify. Midjourney’s new model is so poorly trained that it’s downright easy to get these bad outputs.

              • jacksilver@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                2
                ·
                10 months ago

                This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.

                Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.

                • Even_Adder@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  1
                  ·
                  10 months ago

                  The article dealt with Stable Diffusion, the only open model that allowed people to study it. If there were more problems with Stable Diffusion, we’d’ve heard of them by now. These are the critical solutions Open-source development offers here. By making AI accessible, we maximize public participation and understanding, foster responsible development, as well as prevent harmful control attempts.

                  As it stands, she was much better informed than you are and is an expert in law to boot. On the other hand, you’re making a sweeping generalization right into an appeal to ignorance. It’s dangerous to assert a proposition just because it has not been proven false.

      • Telodzrum@lemmy.world
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        1
        ·
        10 months ago

        This ruling only applies to the 2nd Circuit and SCOTUS has yet to take up a case. As soon as there’s a good fact pattern for the Supreme Court of a circuit split, you’ll get nationwide information. You’ll also note that the decision is deliberately written to provide an extremely narrow precedent and is likely restricted to Google Books and near-identical sources of information.