ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

ForgottenFlux@lemmy.world · 6 months ago

ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

BeatTakeshi@lemmy.world · edit-2 6 months ago

Who would have thought that an artificial intelligence trained on human intelligence would be just as dumb

capital@lemmy.world · edit-2 6 months ago

Hm. This is what I got.

I think about 90% of the screenshots we see of LLMs failing hilariously are doctored. Lemmy users really want to believe it’s that bad through.

Edit:

AIhasUse@lemmy.world · 6 months ago

Yesterday, someone posted a doctored one on here saying everyone eats it up even if you use a ridiculous font in your poorly doctored photo. People who want to believe are quite easy to fool.

zelifcam@lemmy.world · edit-2 6 months ago

“Major new Technology still in Infancy Needs Improvements”

– headline every fucking day

TropicalDingdong@lemmy.world · edit-2 6 months ago

“Will this technology save us from ourselves, or are we just jerking off?”

NounsAndWords@lemmy.world · 6 months ago

GPT-2 came out a little more than 5 years ago, it answered 0% of questions accurately and couldn’t string a sentence together.

GPT-3 came out a little less than 4 years ago and was kind of a neat party trick, but I’m pretty sure answered ~0% of programming questions correctly.

GPT-4 came out a little less than 2 years ago and can answer 48% of programming questions accurately.

I’m not talking about mortality, or creativity, or good/bad for humanity, but if you don’t see a trajectory here, I don’t know what to tell you.

Eheran@lemmy.world · 6 months ago

The study is using 3.5, not version 4.

phoneymouse@lemmy.world · 6 months ago

4 produces inaccurate programming answers too

Eheran@lemmy.world · 6 months ago

Obviously. But it is FAR better yet again.

phoneymouse@lemmy.world · 6 months ago

Not really. I ask it questions all the time and it makes shit up.

Eheran@lemmy.world · 6 months ago

Yes. But it is better than 3.5 without any doubt.

Knock_Knock_Lemmy_In@lemmy.world · 6 months ago

In what year do you estimating AI will have 90% accuracy?

NounsAndWords@lemmy.world · 6 months ago

No clue? Somewhere between a few years (assuming some unexpected breakthrough) or many decades? The consensus from experts (of which I am not) seems to be somewhere in the 2030s/40s for AGI. I’m guessing accuracy probably will be more on a topic by topic basis, LLMs might never even get there, or only related to things they’ve been heavily trained on. If predictive text doesn’t do it then I would be betting on whatever Yann LeCun is working on.

egeres@lemmy.world · 6 months ago

Lemmy seems to be very near-sighted when it comes to the exponential curve of AI progress, I think this is an effect because the community is very anti-corp

NotMyOldRedditName@lemmy.world · edit-2 6 months ago

My experience with an AI coding tool today.

Me: Can you optimize this method.

AI: Okay, here’s an optimized method.

Me seeing the AI completely removed a critical conditional check.

Me: Hey, you completely removed this check with variable xyz

Ai: oops you’re right, here you go I fixed it.

It did this 3 times on 3 different optimization requests.

It was 0 for 3

Although there was some good suggestions in the suggestions once you get past the blatant first error

piecat@lemmy.world · 6 months ago

My favorite is when I ask for something and it gets stuck in a loop, pasting the same comment over and over

efstajas@lemmy.world · 6 months ago

Yeah it’s wrong a lot but as a developer, damn it’s useful. I use Gemini for asking questions and Copilot in my IDE personally, and it’s really good at doing mundane text editing bullshit quickly and writing boilerplate, which is a massive time saver. Gemini has at least pointed me in the right direction with quite obscure issues or helped pinpoint the cause of hidden bugs many times. I treat it like an intelligent rubber duck rather than expecting it to just solve everything for me outright.

Jimmyeatsausage@lemmy.world · 6 months ago

Same here. It’s good for writing your basic unit tests, and the explain feature is useful getting for getting your head wrapped around complex syntax, especially as bad as searching for useful documentation has gotten on Google and ddg.

Subverb@lemmy.world · 6 months ago

ChatGPT and github copilot are great tools, but they’re like a chainsaw: if you apply them incorrectly or become too casual and careless with them, they will kickback at you and fuck your day up.

corroded@lemmy.world · 6 months ago

I will resort to ChatGPT for coding help every so often. I’m a fairly experienced programmer, so my questions usually tend to be somewhat complex. I’ve found that’s it’s extremely useful for those problems that fall into the category of “I could solve this myself in 2 hours, or I could ask AI to solve it for me in seconds.” Usually, I’ll get a working solution, but almost every single time, it’s not a good solution. It provides a great starting-off point to write my own code.

Some of the issues I’ve found (speaking as a C++ developer) are: Variables not declared “const,” extremely inefficient use of data structures, ignoring modern language features, ignoring parallelism, using an improper data type, etc.

ChatGPT is great for generating ideas, but it’s going to be a while before it can actually replace a human developer. Producing code that works isn’t hard; producing code that’s good requires experience.

stufkes@lemmy.world · edit-2 6 months ago

This has been my experience as well. If you already know what you are doing, LLMs can be a great tool. If you are inexperienced, you cannot assess the quality nor the accuracy of the answers, and are using the LLM to replace your own learning.

I like to draw the parallel to people that have learnt to paint only using digital tools. They often show a particular colouring that shows a lack of understanding of colour theory. Because pipette tools mean that you never have to mix colours, you never have to learn to do so. Painting with physical paint isn’t superior, but it presents a hurdle (mixing paint) that is crucial to learn to overcome. Many digital-only artists will still have learnt on traditional media. Once you have the knowledge, the pipette and colour pickers are just a tool, no longer inhibiting anything.

Sumocat@lemmy.world · 6 months ago

They’ve done studies: 48% of the time, it works every time.

Furbag@lemmy.world · edit-2 6 months ago

People down vote me when I point this out in response to “AI will take our jobs” doomerism.

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

I mean, AI eventually will take our jobs, and with any luck it’ll be a good thing when that happens. Just because Chat GPT v3 (or w/e) isn’t up to the task doesn’t mean v12 won’t be.

NoLifeGaming@lemmy.world · 6 months ago

I’m not so sure about the “it’ll be good” part. I’d like to imagine a world where people don’t have to work because everything is done by robots but in reality you’ll have some companies that will make trillions while everyone else will go hungry and become poor and homeless.

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

Yes, that’s exactly the scenario we need to avoid. Automated gay space communism would be ideal, but social democracy might do in a pinch. A sufficiently well-designed tax system coupled with a robust welfare system should make the transition survivable, but the danger with making that our goal is allowing the private firms enough political power that they can reverse the changes.

Furbag@lemmy.world · 6 months ago

Yes, this is also true. I see things like UBI as an inevitable necessity, because AI and automation in general will eliminate the need for most companies to employ humans. Our capitalistic system is set up in a way such that a person can sell their ability to work and provide value to the owner class, but if that dynamic is ever challenged on a fundamental level, it will violently collapse when people who can’t get jobs because a robot replaced them either reject automation to preserve the status quo or embrace a new dynamic that provides for the population’s basic needs without requiring them to be productive.

But the way that managers talk about AI makes it sound like the techbros have convinced everybody that AI is far more powerful than it currently is, which is a glorified chatbot with access to unfiltered Google search results.

assassin_aragorn@lemmy.world · 6 months ago

If it’s possible for AI to reach that level. We shouldn’t take for granted it’s possible.

I was really humbled when I learned that a cubic mm of human brain matter took over a petabyte to map. It suggests to me that AI is nowhere close to the level you’re describing.

Leate_Wonceslace@lemmy.dbzer0.com · edit-2 6 months ago

It suggests to me that AI

This is a fallacy. Specifically, I think you’re committing the informal fallacy confusion of necessary and sufficient conditions. That is to say, we know that if we can reliably simulate a human brain, then we can make an artificial sophont (this is true by mere definition). However, we have no idea what the minimum hardware requirements are for a sufficiently optimized program that runs a sapient mind. Note: I am setting aside what the definition of sapience is, because if you ask 2 different people you’ll get 20 different answers.

We shouldn’t take for granted it’s possible.

I’m pulling from a couple decades of philosophy and conservative estimates of the upper limits of what’s possible as well as some decently-founded plans on how it’s achievable. Suffice it to say, after immersing myself in these discussions for as long as I have I’m pretty thoroughly convinced that AI is not only possible but likely.

The canonical argument goes something like this: if brains are magic, we cannot say if humanlike AI is possible. If brains are not magic, then we know that natural processes can create sapience. Since natural processes can create sapience, it is extraordinarily unlikely that it will prove impossible to create it artificially.

So with our main premise (AI is possible) cogently established, we need to ask the question: “since it’s possible, will it be done, and if not why?” There are a great many advantages to AI, and while there are many risks, the barrier of entry for making progress is shockingly low. We are talking about the potential to create an artificial god with all the wonders and dangers that implies. It’s like a nuclear weapon if you didn’t need to source the uranium; everyone wants to have one, and no one wants their enemy to decide what it gets used for. So everyone has the insensitive to build it (it’s really useful) and everyone has a very powerful disincentive to forbidding the research (there’s no way to stop everyone who wants to, and so the people who’d listen are the people who would make an AI who’ll probably be friendly). So what possible scenario do we have that would mean strong general AI (let alone the simpler things that’d replace everyone’s jobs) never gets developed? The answers range from total societal collapse to extinction, which are all worse than a bad transition to full automation.

So either AI steals everyone’s job or something worse happens.

assassin_aragorn@lemmy.world · 6 months ago

Thanks for the detailed and thought provoking response. I stand corrected. I appreciate the depth you went into!

Leate_Wonceslace@lemmy.dbzer0.com · 6 months ago

You’re welcome! I’m always happy to learn someone re-evaluated their position in light of new information that I provided. 🙂

disconnectikacio@lemmy.world · 6 months ago

Yes there are mistakes, but if you direct it to the right direction, it can give you correct answers

agelord@lemmy.world · 6 months ago

In my experience, if you have the necessary skills to point it at the right direction, you don’t need to use it at the first place

andallthat@lemmy.world · edit-2 6 months ago

it’s just a convenience, not a magic wand. Sure relying on AI blindly and exclusively is a horrible idea (that lots of people peddle and quite a few suckers buy), but there’s room for a supervised and careful use of AI, same as we started using google instead of manpages and (grudgingly, for the older of us) tolerated the addition of syntax highlighting and even some code completion to all but the most basic text editors.

interdimensionalmeme@lemmy.ml · 6 months ago

Yesterday, I wrote all of this, working javascript code https://github.com/igorlogius/gather-from-tabs/discussions/8 And I don’t know a lick of javascript I know other languages but that barely was needed. I just gave it plain language instructions and reported the errors until it worked.

aidan@lemmy.world · 6 months ago

It can, it also sometimes can’t unless you ask it “could it be x answer”

shotgun_crab@lemmy.world · 6 months ago

I always thought of it as a tool to write boilerplate faster, so no surprises for me

RagingSnarkasm@lemmy.world · 6 months ago

Better than Jerry in the next cubicle over.

aesthelete@lemmy.world · edit-2 6 months ago

Sounds low

interdimensionalmeme@lemmy.ml · 6 months ago

Yes, and even if it was only right 1% of the time it would still be amazing

Also hallucinations are not a universally bad thing.

cultsuperstar@lemmy.world · 6 months ago

Not a programmer by any means (haven’t done any since college) but I’ve asked it for help in writing Jira queries or Excel mess and it’s been pretty solid with that stuff.

Siegfried@lemmy.world · 6 months ago

Well, I do it 99% of the times