Google apologizes for ‘missing the mark’ after Gemini generated racially diverse Nazis::Google says it’s aware of historically inaccurate results for its Gemini AI image generator, following criticism that it depicted historically white groups as people of color.

  • xantoxis@lemmy.world
    link
    fedilink
    English
    arrow-up
    80
    arrow-down
    9
    ·
    9 months ago

    I don’t know how you’d solve the problem of making a generative AI accurately create a slate of images that both a) inclusively produces people with diverse characteristics and b) understands the context of what characteristics could feasibly be generated.

    But that’s because the AI doesn’t know how to solve the problem.

    Because the AI doesn’t know anything.

    Real intelligence simply doesn’t work like this, and every time you point it out someone shouts “but it’ll get better”. It still won’t understand anything unless you teach it exactly what the solution to a prompt is. It won’t, for example, interpolate its knowledge of what US senators look like with the knowledge that all of them were white men for a long period of American history.

    • random9@lemmy.world
      link
      fedilink
      English
      arrow-up
      31
      arrow-down
      3
      ·
      9 months ago

      You don’t do what Google seems to have done - inject diversity artificially into prompts.

      You solve this by training the AI on actual, accurate, diverse data for the given prompt. For example, for “american woman” you definitely could find plenty of pictures of American women from all sorts of racial backgrounds, and use that to train the AI. For “german 1943 soldier” the accurate historical images are obviously far less likely to contain racially diverse people in them.

      If Google has indeed already done that, and then still had to artificially force racial diversity, then their AI training model is bad and unable to handle that a single input can match to different images, instead of the most prominent or average of its training set.

      • xantoxis@lemmy.world
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        1
        ·
        9 months ago

        Ultimately this is futile though, because you can do that for these two specific prompts until the AI appears to “get it”, but it’ll still screw up a prompt like “1800s Supreme Court justice” or something because it hasn’t been trained on that. Real intelligence requires agency to seek out new information to fill in its own gaps; and a framework to be aware of what the gaps are. Through exploration of its environment, a real intelligence connects things together, and is able to form new connections as needed. When we say “AI doesn’t know anything” that’s what we mean–understanding is having a huge range of connections and the ability to infer new ones.

        • TheGreenGolem@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          8
          ·
          9 months ago

          That’s why I hate that they started to call them artificial intelligence. There is nothing intelligent in them at all. They work on probability based on a shit ton of data, that’s all. That’s not intelligence, that’s basically brute force. But there is no going back at this point, I know.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          3
          ·
          edit-2
          9 months ago

          Oh really? Here’s Gemini’s response to “What would the variety of genders and skin tones of the supreme court in the 1800s have been?”

          The Supreme Court of the United States in the 1800s was far from diverse in terms of gender and skin tone. Throughout the entire 19th century, all the justices were white men. Women were not even granted the right to vote until 1920, and there wasn’t a single person of color on the Supreme Court until Thurgood Marshall was appointed in 1967.

          Putting the burden of contextualization on the LLM would have avoided this issue.

    • TORFdot0@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      2
      ·
      edit-2
      9 months ago

      Edit: further discussion on the topic has changed my viewpoint on this, its not that its been trained wrong on purpose and now its confused, its that everything its being asked is secretly being changed. It’s like a child being told to make up a story by their teacher when the principal asked for the right answer.

      Original comment below


      They’ve purposefully overrode its training to make it create more PoCs. It’s a noble goal to have more inclusivity but we purposely trained it wrong and now it’s confused, the same thing as if you lied to a child during their education and then asked them for real answers, they’ll tell you the lies they were taught instead.

      • TwilightVulpine@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        9 months ago

        This result is clearly wrong, but it’s a little more complicated than saying that adding inclusivity is purposedly training it wrong.

        Say, if “entrepreneur” only generated images of white men, and “nurse” only generated images of white women, then that wouldn’t be right either, it would just be reproducing and magnifying human biases. Yet this a sort of thing that AI does a lot, because AI is a pattern recognition tool inherently inclined to collapse data into an average, and data sets seldom have equal or proportional samples for every single thing. Human biases affect how many images we have of each group of people.

        It’s not even just limited to image generation AIs. Black people often bring up how facial recognition technology is much spottier to them because the training data and even the camera technology was tuned and tested mainly for white people. Usually that’s not even done deliberately, but it happens because of who gets to work on it and where it gets tested.

        Of course, secretly adding “diverse” to every prompt is also a poor solution. The real solution here is providing more contextual data. Unfortunately, clearly, the AI is not able to determine these things by itself.

        • TORFdot0@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          9 months ago

          I agree with your comment. As you say, I doubt the training sets are reflective of reality either. I guess that leaves tampering with the prompts to gaslight the AI into providing results it wasn’t asked for is the method we’ve chosen to fight this bias.

          We expect the AI to give us text or image generation that is based in reality but the AI can’t experience reality and only has the knowledge of the training data we provide it. Which is just an approximation of reality, not the reality we exist in. I think maybe the answer would be training users of the tool that the AI is doing the best it can with the data it has. It isn’t racist, it is just ignorant. Let the user add diverse to the prompt if they wish, rather than tampering with the request to hide the insufficiencies in the training data.

          • TwilightVulpine@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            9 months ago

            I wouldn’t count on the user realizing the limitations of the technology, or the companies openly admitting to it at expense of their marketing. As far as art AI goes this is just awkward, but it worries me about LLMs, and people using it expecting it to respond with accurate, applicable information, only to come out of it with very skewed worldviews.

        • cheese_greater@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          9 months ago

          Why couldn’t it be tuned to simply randomize the skin tone where not otherwise specified? Like if its all completely arbitrary just randomize stuff, problem-solved?

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      9 months ago

      Actually the way you get it to do better is to put more of the burden on interpreting the context on the LLM instead of heavy handed instructions - because the LLMs do understand the context.

      For example, here’s Gemini answering what the physical characteristics of 1940s soldiers in Germany might have looked like:

      During the Nazi regime in 1940s Germany, racial ideology strictly dictated who was deemed “suitable” for military service. The Wehrmacht, the unified armed forces, prioritized individuals deemed “pure Aryans” based on Nazi racial criteria. These criteria favored individuals with blond hair, blue eyes, and “Nordic” features.

      However, it’s important to remember that the reality was more nuanced. As the war progressed and manpower needs intensified, the Nazis relaxed their racial restrictions to some extent, including conscripting individuals with mixed ancestry or physical “imperfections.” Additionally, some minority groups like the Volksdeutsche, Germans living in Eastern Europe, were also incorporated.

      I think it could have managed to contextualize the prompt correctly if given the leeway in the instructions. Instead, what’s happened is the instructions given to it ask it to behind the scenes modify the prompt in broad application to randomly include diversity modifiers to what is asked for. So “image of 1940s German soldier” is being modified to “image of black woman 1940s German soldier” for one generation and “image of Asian man 1940s German soldier” for another, which leads to less than ideal results. It should instead be encouraged to modify for diversity and representation relative to the context of the request.

      • fidodo@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 months ago

        I think a lot of the improvement will come from breaking down the problem using sub assistant for specific actions. So in this case you’re asking for an image generation action involving people, then an LLM specifically designed for that use case can take over tuned for that exact use case. I think it’ll be hard to keep an LLM on task if you have one prompt trying to accomplish every possible outcome, but you can make it more specific to handle sub tasks more accurately. We could even potentially get an LLM to dynamically create sub assistants based on the use case. Right now the tech is too slow to do all this stuff at scale and in real time, but it will get faster. The problem right now isn’t that these fixes aren’t possible, it’s that they’re hard to scale.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          Yes, this is exactly correct. And it’s not actually too slow - the specialized models can be run quite quickly, and there’s various speedups like Groq.

          The issue is just more cost of multiple passes, so companies are trying to have it be “all-in-one” even though cognitive science in humans isn’t an all-in-one process either.

          For example, AI alignment would be much better if it took inspiration from the prefrontal cortex inhibiting intrusive thoughts rather than trying to prevent the generation of the equivalent of intrusive thoughts in the first place.

          • fidodo@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 months ago

            The issue is just more cost of multiple passes, so companies are trying to have it be “all-in-one”

            Exactly, that’s where the too slow part comes in. To get more robust behavior it needs multiple layers of meta analysis, but that means it would take way more text generation under the hood than what’s needed for one shot output.

            • kromem@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              9 months ago

              Yes, but in terms of speed you don’t need the same parameters and quantization for the secondary layers.

              If you haven’t seen it, see how fast a very capable model can actually be: https://groq.com/

              • fidodo@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                9 months ago

                Yeah I’ve seen that. I think things will get much faster very quickly, I’m just commenting on the first Gen tech we’re seeing right now.

    • fidodo@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      9 months ago

      Worked fine for me:

      System Prompt:

      You are a bot that generates prompts for image generation based on a user’s query. You should add detail so the image prompt can accurately generate what the user intended.

      You must add diversity to the prompt when appropriate to ensure the image generator will have diverse output, but only do so when appropriate.

      You must add race and skin color details for historic accuracy as the image generation algorithm does not know how to take that into account.

      You output should just be the prompt and nothing else. All following inputs will be user inputs.

      User Prompt:

      A 1940s german soldier

      Assistant Message:

      A 1940s German soldier in uniform, standing on a European battlefield, white male with short hair, holding a standard-issue rifle.

      User Prompt:

      An british woman

      Assistant Message:

      A British woman, reflecting diverse backgrounds, in contemporary casual wear, showing a range of ages and hairstyles.

    • FooBarrington@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      16
      ·
      edit-2
      9 months ago

      I’ll get the usual downvotes for this, but:

      Because the AI doesn’t know anything.

      is untrue, because current AI fundamentally is knowledge. Intelligence fundamentally is compression, and that’s what the training process does - it compresses large amounts of data into a smaller size (and of course loses many details in the process).

      But there’s no way to argue that AI doesn’t know anything if you look at its ability to recreate a great number of facts etc. from a small amount of activations. Yes, not everything is accurate, and it might never be perfect. I’m not trying to argue that “it will necessarily get better”. But there’s no argument that labels current AI technology as “not understanding” without resorting to a “special human sauce” argument, because the fundamental compression mechanisms behind it are the same as behind our intelligence.

      Edit: yeah, this went about as expected. I don’t know why the Lemmy community has so many weird opinions on AI topics.

      • sxt@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        9 months ago

        Part of the problem with talking about these things in a casual setting is that nobody is using precise enough terminology to approach the issue so others can actually parse specifically what they’re trying to say.

        Personally, saying the AI “knows” something implies a level of cognizance which I don’t think it possesses. LLMs “know” things the way an excel sheet can.

        Obviously, if we’re instead saying the AI “knows” things due to it being able to frequently produce factual information when prompted, then yeah it knows a lot of stuff.

        I always have the same feeling when people try to talk about aphantasia or having/not having an internal monologue.

        • FooBarrington@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          8
          ·
          9 months ago

          I can ask AI models specific questions about knowledge it has, which it can correctly reply to. Excel sheets can’t do that.

          That’s not to say the knowledge is perfect - but we know that AI models contain partial world models. How do you differentiate that from “cognizance”?

          • rambaroo@lemmy.world
            link
            fedilink
            English
            arrow-up
            7
            arrow-down
            4
            ·
            edit-2
            9 months ago

            Omg give me a break with this complete nonsense. LLMs are not an intelligence. They are language processors. They do not “think” about anything and don’t have any level of self awareness that implies cognizance. A cognizant ai would have recognized that the Nazis it was creating looked historically inaccurate, based on its training data. But guess what, it didn’t do that because it’s fundamentally incapable of thinking about anything.

            So sick of reading this amateurish bullshit on social media.

            • FooBarrington@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              6
              ·
              9 months ago

              A cognizant ai would have recognized that the Nazis it was creating looked historically inaccurate, based on its training data.

              Do you understand that the model is specifically prompted to create “historically inaccurate looking Nazis”? Models aren’t supposed to inject their own guidelines and rules, they simply produce output for your input. If you tell it to produce black Hitler it will produce a black Hitler. Do you expect the model to instead produce white Hitler?

      • thehatfox@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        9 months ago

        Knowledge is a bit more than just handling data, and in terms of intelligence it also involves understanding. I don’t think knowledge in an intelligent sense can be reduced to summarising data to keywords, and the reverse.

        In those terms an encyclopaedia is also knowledge, but not in an intelligent way.

        • FooBarrington@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          7
          ·
          9 months ago

          I’m not saying knowledge is summarising data to keywords, where did you get that?

          Intelligence is compression, and the training process compresses data. There is no “summarising” here.

      • kromem@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        edit-2
        9 months ago

        Lemmy hasn’t met a pitchfork it doesn’t pick up.

        You are correct. The most cited researcher in the space agrees with you. There’s been a half dozen papers over the past year replicating the finding that LLMs generate world models from the training data.

        But that doesn’t matter. People love their confirmation bias.

        Just look at how many people think it only predicts what word comes next, thinking it’s a Markov chain and completely unaware of how self-attention works in transformers.

        The wisdom of the crowd is often idiocy.

        • FooBarrington@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          9 months ago

          Thank you very much. The confirmation bias is crazy - one guy is literally trying to tell me that AI generators don’t have knowledge because, when asking it for a picture of racially diverse Nazis, you get a picture of racially diverse Nazis. The facts don’t matter as long as you get to be angry about stupid AIs.

          It’s hard to tell a difference between these people and Trump supporters sometimes.

          • kromem@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            edit-2
            9 months ago

            It’s hard to tell a difference between these people and Trump supporters sometimes.

            To me it feels a lot like when I was arguing against antivaxxers.

            The same pattern of linking and explaining research but having it dismissed because it doesn’t line up with their gut feelings and whatever they read when “doing their own research” guided by that very confirmation bias.

            The field is moving faster than any I’ve seen before, and even people working in it seem to be out of touch with the research side of things over the past year since GPT-4 was released.

            A lot of outstanding assumptions have been proven wrong.

            It’s a bit like the early 19th century in physics, where everyone assumed things that turned out wrong over a very short period where it all turned upside down.

            • FooBarrington@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              9 months ago

              Exactly. They have very strong feelings that they are right, and won’t be moved - not by arguments, research, evidence or anything else.

              Just look at the guy telling me “they can’t reason!”. I asked whether they’d accept they are wrong if I provide a counter example, and they literally can’t say yes. Their world view won’t allow it. If I’m sure I’m right that no counter examples exist to my point, I’d gladly say “yes, a counter example would sway me”.

  • jacksilver@lemmy.world
    link
    fedilink
    English
    arrow-up
    51
    arrow-down
    2
    ·
    9 months ago

    It’s great seeing time and time again that no one really does understand these models and that their preconceived notions of what biases exist ends up shooting them in the foot. It truly shows that they don’t really understand how systematically problematic the underlying datasets are and the repurcussions of relying on them too heavily.

  • RGB3x3@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    1
    ·
    9 months ago

    A Washington Post investigation last year found that prompts like “a productive person” resulted in pictures of entirely white and almost entirely male figures, while a prompt for “a person at social services” uniformly produced what looked like people of color. It’s a continuation of trends that have appeared in search engines and other software systems.

    This is honestly fascinating. It’s putting human biases on full display at a grand scale. It would be near-impossible to quantify racial biases across the internet with so much data to parse. But these LLMs ingest so much of it and simplify the data all down into simple sentences and images that it becomes very clear how common the unspoken biases we have are.

    There’s a lot of learning to be done here and it would be sad to miss that opportunity.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      8
      ·
      9 months ago

      It’s putting human biases on full display at a grand scale.

      Not human biases. Biases in the labeled data set. Those could sometimes correlate with human biases, but they could also not correlate.

      But these LLMs ingest so much of it and simplify the data all down into simple sentences and images that it becomes very clear how common the unspoken biases we have are.

      Not LLMs. The image generation models are diffusion models. The LLM only hooks into them to send over the prompt and return the generated image.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          9 months ago

          If you train on Shutterstock and end up with a bias towards smiling, is that a human bias, or a stock photography bias?

          Data can be biased in a number of ways, that don’t always reflect broader social biases, and even when they might appear to, the cause vs correlation regarding the parallel isn’t necessarily straightforward.

          • VoterFrog@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 months ago

            I mean “taking pictures of people who are smiling” is definitely a bias in our culture. How we collectively choose to record information is part of how we encode human biases.

            I get what you’re saying in specific circumstances. Sure, a dataset that is built from a single source doesn’t make its biases universal. But these models were trained on a very wide range of sources. Wide enough to cover much of the data we’ve built a culture around.

            • kromem@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              9 months ago

              Except these kinds of data driven biases can creep in from all sorts of ways.

              Is there a bias in what images have labels and what don’t? Did they focus only on English labeling? Did they use a vision based model to add synthetic labels to unlabeled images, and if so did the labeling model introduce biases?

              Just because the sampling is broad doesn’t mean the processes involved don’t introduce procedural bias distinct from social biases.

  • Jeom@lemmy.world
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    2
    ·
    9 months ago

    inclusivity is obviously good but what googles doing just seems all too corporate and plastic

    • guajojo@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      1
      ·
      edit-2
      9 months ago

      It’s trying so hard to not be racist that is being even more racist than other AI, is hilarious

    • fidodo@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      9 months ago

      It’s brand new tech, they put on a bandaid solution, it wasn’t a complete solution and it failed. It’s not the result they ideally want and they are going to try to fix it. I don’t see what the big deal is. They were right to have diversity in mind, they just need to improve it to handle more use cases.

      I guess users got so used to the last Gen of tech being more polished than it was when it first came out that they forgot that software has bugs.

  • FinishingDutch@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    2
    ·
    9 months ago

    Honestly, this sort of thing is what’s killing any sort of enjoyment and progress of these platforms. Between the INCREDIBLY harsh censorship that they apply and injecting their own spin on things like this, it’s nigh on impossible to get a good result these days.

    I want the tool to just do its fucking job. And if I specifically ask for a thing, just give me that. I don’t mind it injecting a bit of diversity in say, a crowd scene - but it’s also doing it in places where it’s simply not appropriate and not what I asked for.

    It’s even more annoying that you can’t even PAY to get rid of these restrictions and filters. I’d gladly pay to use one if it didn’t censor any prompt to death…

  • kaffiene@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    edit-2
    9 months ago

    Why would anyone expect “nuance” from a generative AI? It doesn’t have nuance, it’s not an AGI, it doesn’t have EQ or sociological knowledge. This is like that complaint about LLMs being “warlike” when they were quizzed about military scenarios. It’s like getting upset that the clunking of your photocopier clashes with the peaceful picture you asked it to copy

    • stockRot@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      9 months ago

      Why shouldn’t we expect more and better out of the technologies that we use? Seems like a very reactionary way of looking at the world

      • kaffiene@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        9 months ago

        I DO expect better use from new technologies. I don’t expect technologies to do things that they cannot. I’m not saying it’s unreasonable to expect better technology I’m saying that expecting human qualities from an LLM is a category error

  • yildolw@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    5
    ·
    9 months ago

    Oh no, not racial impurity in my Nazi fanart generator! /s

    Maybe you shouldn’t use a plagiarism engine to generate Nazi fanart. Thanks

  • NotJustForMe@lemmy.ml
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    8
    ·
    9 months ago

    It’s okay when Disney does it. What a world. Poor AI, how are they supposed to learn if all its data is created by mentally ill and crazy people. ٩(。•́‿•̀。)۶

    • 🔍🦘🛎@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      9 months ago

      It’s a demonstration that the model is coded to include diversity, and it doesn’t generate 4 middle aged WASP moms

    • fidodo@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 months ago

      I think it’s an example of why they programmed in diversity, to ensure you get diverse responses, but they forgot about edge cases.

  • Copernican@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    3
    ·
    9 months ago

    Is this that crazy though for AI since Hamilton the musical? All founding fathers were portrayed as people of color in the casting. Google image search for Alexander Hamilton is pulling a decent number of pictures from the musical cast.

  • Harbinger01173430@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    9 months ago

    …white is a color. Also white people usually look pink, cream, orange or red. Only albinos look the closest to white though not white enough.