The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT.
“Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”
Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.
“However, they also overlooked the misinformation in the ChatGPT answers 39% of the time,” according to the study. “This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”
Who would have thought that an artificial intelligence trained on human intelligence would be just as dumb
Hm. This is what I got.
I think about 90% of the screenshots we see of LLMs failing hilariously are doctored. Lemmy users really want to believe it’s that bad through.
Edit:
Yesterday, someone posted a doctored one on here saying everyone eats it up even if you use a ridiculous font in your poorly doctored photo. People who want to believe are quite easy to fool.
“Major new Technology still in Infancy Needs Improvements”
– headline every fucking day
“Will this technology save us from ourselves, or are we just jerking off?”
GPT-2 came out a little more than 5 years ago, it answered 0% of questions accurately and couldn’t string a sentence together.
GPT-3 came out a little less than 4 years ago and was kind of a neat party trick, but I’m pretty sure answered ~0% of programming questions correctly.
GPT-4 came out a little less than 2 years ago and can answer 48% of programming questions accurately.
I’m not talking about mortality, or creativity, or good/bad for humanity, but if you don’t see a trajectory here, I don’t know what to tell you.
The study is using 3.5, not version 4.
4 produces inaccurate programming answers too
Obviously. But it is FAR better yet again.
Not really. I ask it questions all the time and it makes shit up.
Yes. But it is better than 3.5 without any doubt.
In what year do you estimating AI will have 90% accuracy?
No clue? Somewhere between a few years (assuming some unexpected breakthrough) or many decades? The consensus from experts (of which I am not) seems to be somewhere in the 2030s/40s for AGI. I’m guessing accuracy probably will be more on a topic by topic basis, LLMs might never even get there, or only related to things they’ve been heavily trained on. If predictive text doesn’t do it then I would be betting on whatever Yann LeCun is working on.
Lemmy seems to be very near-sighted when it comes to the exponential curve of AI progress, I think this is an effect because the community is very anti-corp
My experience with an AI coding tool today.
Me: Can you optimize this method.
AI: Okay, here’s an optimized method.
Me seeing the AI completely removed a critical conditional check.
Me: Hey, you completely removed this check with variable xyz
Ai: oops you’re right, here you go I fixed it.
It did this 3 times on 3 different optimization requests.
It was 0 for 3
Although there was some good suggestions in the suggestions once you get past the blatant first error
My favorite is when I ask for something and it gets stuck in a loop, pasting the same comment over and over
Yeah it’s wrong a lot but as a developer, damn it’s useful. I use Gemini for asking questions and Copilot in my IDE personally, and it’s really good at doing mundane text editing bullshit quickly and writing boilerplate, which is a massive time saver. Gemini has at least pointed me in the right direction with quite obscure issues or helped pinpoint the cause of hidden bugs many times. I treat it like an intelligent rubber duck rather than expecting it to just solve everything for me outright.
Same here. It’s good for writing your basic unit tests, and the explain feature is useful getting for getting your head wrapped around complex syntax, especially as bad as searching for useful documentation has gotten on Google and ddg.
ChatGPT and github copilot are great tools, but they’re like a chainsaw: if you apply them incorrectly or become too casual and careless with them, they will kickback at you and fuck your day up.
I will resort to ChatGPT for coding help every so often. I’m a fairly experienced programmer, so my questions usually tend to be somewhat complex. I’ve found that’s it’s extremely useful for those problems that fall into the category of “I could solve this myself in 2 hours, or I could ask AI to solve it for me in seconds.” Usually, I’ll get a working solution, but almost every single time, it’s not a good solution. It provides a great starting-off point to write my own code.
Some of the issues I’ve found (speaking as a C++ developer) are: Variables not declared “const,” extremely inefficient use of data structures, ignoring modern language features, ignoring parallelism, using an improper data type, etc.
ChatGPT is great for generating ideas, but it’s going to be a while before it can actually replace a human developer. Producing code that works isn’t hard; producing code that’s good requires experience.
This has been my experience as well. If you already know what you are doing, LLMs can be a great tool. If you are inexperienced, you cannot assess the quality nor the accuracy of the answers, and are using the LLM to replace your own learning.
I like to draw the parallel to people that have learnt to paint only using digital tools. They often show a particular colouring that shows a lack of understanding of colour theory. Because pipette tools mean that you never have to mix colours, you never have to learn to do so. Painting with physical paint isn’t superior, but it presents a hurdle (mixing paint) that is crucial to learn to overcome. Many digital-only artists will still have learnt on traditional media. Once you have the knowledge, the pipette and colour pickers are just a tool, no longer inhibiting anything.
They’ve done studies: 48% of the time, it works every time.
People down vote me when I point this out in response to “AI will take our jobs” doomerism.
I mean, AI eventually will take our jobs, and with any luck it’ll be a good thing when that happens. Just because Chat GPT v3 (or w/e) isn’t up to the task doesn’t mean v12 won’t be.
I’m not so sure about the “it’ll be good” part. I’d like to imagine a world where people don’t have to work because everything is done by robots but in reality you’ll have some companies that will make trillions while everyone else will go hungry and become poor and homeless.
Yes, that’s exactly the scenario we need to avoid. Automated gay space communism would be ideal, but social democracy might do in a pinch. A sufficiently well-designed tax system coupled with a robust welfare system should make the transition survivable, but the danger with making that our goal is allowing the private firms enough political power that they can reverse the changes.
Yes, this is also true. I see things like UBI as an inevitable necessity, because AI and automation in general will eliminate the need for most companies to employ humans. Our capitalistic system is set up in a way such that a person can sell their ability to work and provide value to the owner class, but if that dynamic is ever challenged on a fundamental level, it will violently collapse when people who can’t get jobs because a robot replaced them either reject automation to preserve the status quo or embrace a new dynamic that provides for the population’s basic needs without requiring them to be productive.
But the way that managers talk about AI makes it sound like the techbros have convinced everybody that AI is far more powerful than it currently is, which is a glorified chatbot with access to unfiltered Google search results.
If it’s possible for AI to reach that level. We shouldn’t take for granted it’s possible.
I was really humbled when I learned that a cubic mm of human brain matter took over a petabyte to map. It suggests to me that AI is nowhere close to the level you’re describing.
It suggests to me that AI
This is a fallacy. Specifically, I think you’re committing the informal fallacy confusion of necessary and sufficient conditions. That is to say, we know that if we can reliably simulate a human brain, then we can make an artificial sophont (this is true by mere definition). However, we have no idea what the minimum hardware requirements are for a sufficiently optimized program that runs a sapient mind. Note: I am setting aside what the definition of sapience is, because if you ask 2 different people you’ll get 20 different answers.
We shouldn’t take for granted it’s possible.
I’m pulling from a couple decades of philosophy and conservative estimates of the upper limits of what’s possible as well as some decently-founded plans on how it’s achievable. Suffice it to say, after immersing myself in these discussions for as long as I have I’m pretty thoroughly convinced that AI is not only possible but likely.
The canonical argument goes something like this: if brains are magic, we cannot say if humanlike AI is possible. If brains are not magic, then we know that natural processes can create sapience. Since natural processes can create sapience, it is extraordinarily unlikely that it will prove impossible to create it artificially.
So with our main premise (AI is possible) cogently established, we need to ask the question: “since it’s possible, will it be done, and if not why?” There are a great many advantages to AI, and while there are many risks, the barrier of entry for making progress is shockingly low. We are talking about the potential to create an artificial god with all the wonders and dangers that implies. It’s like a nuclear weapon if you didn’t need to source the uranium; everyone wants to have one, and no one wants their enemy to decide what it gets used for. So everyone has the insensitive to build it (it’s really useful) and everyone has a very powerful disincentive to forbidding the research (there’s no way to stop everyone who wants to, and so the people who’d listen are the people who would make an AI who’ll probably be friendly). So what possible scenario do we have that would mean strong general AI (let alone the simpler things that’d replace everyone’s jobs) never gets developed? The answers range from total societal collapse to extinction, which are all worse than a bad transition to full automation.
So either AI steals everyone’s job or something worse happens.
Thanks for the detailed and thought provoking response. I stand corrected. I appreciate the depth you went into!
You’re welcome! I’m always happy to learn someone re-evaluated their position in light of new information that I provided. 🙂
Yes there are mistakes, but if you direct it to the right direction, it can give you correct answers
In my experience, if you have the necessary skills to point it at the right direction, you don’t need to use it at the first place
it’s just a convenience, not a magic wand. Sure relying on AI blindly and exclusively is a horrible idea (that lots of people peddle and quite a few suckers buy), but there’s room for a supervised and careful use of AI, same as we started using google instead of manpages and (grudgingly, for the older of us) tolerated the addition of syntax highlighting and even some code completion to all but the most basic text editors.
Yesterday, I wrote all of this, working javascript code https://github.com/igorlogius/gather-from-tabs/discussions/8 And I don’t know a lick of javascript I know other languages but that barely was needed. I just gave it plain language instructions and reported the errors until it worked.
It can, it also sometimes can’t unless you ask it “could it be x answer”
I always thought of it as a tool to write boilerplate faster, so no surprises for me
Better than Jerry in the next cubicle over.
Sounds low
Yes, and even if it was only right 1% of the time it would still be amazing
Also hallucinations are not a universally bad thing.
Not a programmer by any means (haven’t done any since college) but I’ve asked it for help in writing Jira queries or Excel mess and it’s been pretty solid with that stuff.
Well, I do it 99% of the times