AI bots hallucinate software packages and devs download them

db0@lemmy.dbzer0.com · 1 year ago

AI bots hallucinate software packages and devs download them

RustyNova@lemmy.world · 1 year ago

*bad Devs

Always look on the official repository. Not just to see if it exists, but also to make sure it isn’t a fake/malicious one

db0@lemmy.dbzer0.com · 1 year ago

You’d be surprised how well someone who wants to can camouflage their package to look legit.

RustyNova@lemmy.world · 1 year ago

True. You can’t always be 100% sure. But a quick check for download counts/version count can help. And while searching for it in the repo, you can see other similarly named packages and prevent getting hit by a typo squatter.

Despite, it’s not just for security. What if the package you’re installing has a big banner in the readme that says “Deprecated and full of security issues”? It’s not a bad package per say, but still something you need to know

YoorWeb@lemmy.world · 1 year ago

*per se

https://en.m.wiktionary.org/wiki/per_se

RustyNova@lemmy.world · edit-2 1 year ago

Oh, TIL

Edit: *YourWeb

laughterlaughter@lemmy.world · 1 year ago

Oh, TIL.

Edit: *YourWeb.

KairuByte@lemmy.dbzer0.com · 1 year ago

Yeah, I’m confused on what the intent of the comment was. Apart from a code review, I don’t understand how someone would be able to tell that a package is fake. Unless they are grabbing it from a. Place with reviews/comments to warn them off.

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

the first most obvious sign is multiple indentical packages, appearing to be the same thing, with weird stats and figures.

And possibly weird sizes. Usually people don’t try hard on package managing software, unless it’s an OS for some reason.

KairuByte@lemmy.dbzer0.com · 1 year ago

Unless you’re cross checking every package, you’re not going to know that there are multiple packages. And a real package doesn’t necessarily give detailed information on what it does, meaning you can easily mistake real packages as fake when using this as a test.

The real answer is to not trust AI outputs, but there is no perfect answer to this since those fake packages can easily be put up and sound like real ones with a cursory check.

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

depends on how you integrate it i suppose. A system that abstracts that is pretty awful.

At the very least, you should be weary of there being more than one package, without explicit reason for such.

KillingTimeItself@lemmy.dbzer0.com · edit-2 1 year ago

we just experienced this with LZMA on debian according to recent reports. 2 years of either manufactured dev history, or one very, very weird episode.

UmeU@lemmy.world · 1 year ago

That’s what my ex wife used to say

anlumo@lemmy.world · 1 year ago

I just want an LLM with a reasonable context window so we can actually write real working packages with it.

The demos look great, but it’s always just around 100 lines of code, which is beginner level. The only use case right now is fake packages.

db0@lemmy.dbzer0.com · edit-2 1 year ago

Just use the AI Horde. iirc our standard is like 4K context and some people host up to 8K. Here’s a frontend

RatBin@lemmy.world · 1 year ago

I have tried the copilot integration in edge out of curiosity, and if you feed the ai the context of the page the response can be useful. There is a catch, tho:

when opening a document the accepted formats are html, txt, pdf. The documentation of a software package can be summarized but thr source will be the context of the page and not a web search, which is good in this casr
when generating new information, the model can be far too sintethic, cutting out potentially useful informations.

I still think you need to read the documentation yourself, maybe using the AI integration only when you need a general idea of the document.

What I do is first reading the summary of the documebt by bullet point, than reading the pdf file as a whole. By the time I do so, the LLM has given enough of a structure to facilitate my readings…

Cosmic Cleric@lemmy.world · 1 year ago

From the article…

hallucinated software packages – package names invented by generative AI models, presumably during project development

Flying Squid@lemmy.world · 1 year ago

It’s 2024. No more quality control, no more double-checking, not in any industry at this point. We’re all alpha testers. Not even beta testers.

As the old entertainment industry adage goes when anything goes wrong on the set, “we’ll fix it in post.”

boatsnhos931@lemmy.world · 1 year ago

Lie… no hallucinate…they lie and make shit up… just like a real hooman!! :))

KillingTimeItself@lemmy.dbzer0.com · 1 year ago

daily PSA that something like [insert number of packages] are deprecated on shipment of software.

Thanks guys, very cool.