The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT.

“Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”

Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.

“However, they also overlooked the misinformation in the ChatGPT answers 39% of the time,” according to the study. “This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”

  • NounsAndWords@lemmy.world
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    14
    ·
    6 months ago

    GPT-2 came out a little more than 5 years ago, it answered 0% of questions accurately and couldn’t string a sentence together.

    GPT-3 came out a little less than 4 years ago and was kind of a neat party trick, but I’m pretty sure answered ~0% of programming questions correctly.

    GPT-4 came out a little less than 2 years ago and can answer 48% of programming questions accurately.

    I’m not talking about mortality, or creativity, or good/bad for humanity, but if you don’t see a trajectory here, I don’t know what to tell you.

    • egeres@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      Lemmy seems to be very near-sighted when it comes to the exponential curve of AI progress, I think this is an effect because the community is very anti-corp

      • NounsAndWords@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        6 months ago

        No clue? Somewhere between a few years (assuming some unexpected breakthrough) or many decades? The consensus from experts (of which I am not) seems to be somewhere in the 2030s/40s for AGI. I’m guessing accuracy probably will be more on a topic by topic basis, LLMs might never even get there, or only related to things they’ve been heavily trained on. If predictive text doesn’t do it then I would be betting on whatever Yann LeCun is working on.