@mm_maybe

mm_maybe@sh.itjust.works · 8 days ago

this is learning completely the wrong lesson. it has been well-known for a long time and very well demonstrated that smaller models trained on better-curated data can outperform larger ones trained using brute force “scaling”. this idea that “bigger is better” needs to die, quickly, or else we’re headed towards not only an AI winter but an even worse climate catastrophe as the energy requirements of AI inference on huge models obliterate progress on decarbonization overall.

mm_maybe@sh.itjust.works · 8 days ago

those are all classification problems, which is a fundamentally different kind of problem with less open-ended solutions, so it’s not surprising that they are easier to train and deploy.

mm_maybe@sh.itjust.works · 10 days ago

I really wish it were easier to fine-tune and run inference on GPT-J-6B as well… that was a gem of a base model for research purposes, and for a hot minute circa Dolly there were finally some signs it would become more feasible to run locally. But all the effort going into llama.cpp and GGUF kinda left GPT-J behind. GPT4All used to support it, I think, but last I checked the documentation had huge holes as to how exactly that’s done.

mm_maybe@sh.itjust.works · 10 days ago

One of the reasons I love StarCoder, even for non-coding tasks. Trained only on Github means no “instruction finetuning” bullshit ChatGPT-speak.