

Exactly. I would almost give the AI 2027 authors credit for committing to a hard date… except they already have a subtly hidden asterisk in the original AI 2027 noting some of the authors have longer timelines. And I’ve noticed lots of hand-wringing and but achkshuallies in their lesswrong comments about the difference between mode and median and mean dates and other excuses.
Like see this comment chain https://www.lesswrong.com/posts/5c5krDqGC5eEPDqZS/analyzing-a-critique-of-the-ai-2027-timeline-forecasts?commentId=2r8va889CXJkCsrqY :
My timelines move dup to median 2028 before we published AI 2027 actually, based on a variety of factors including iteratively updating our models. But it was too late to rewrite the whole thing to happen a year later, so we just published it anyway. I tweeted about this a while ago iirc.
…You got your AI 2027 reposted like a dozen times to /r/singularity, maybe many dozens of times total across Reddit. The fucking vice president has allegedly read your fiction project. And you couldn’t be bothered to publish your best timeline?
So yeah, come 2028/2029, they already have a ready made set of excuse to backpedal and move back the doomsday prophecy.
So for most actual practical software development, writing tests is in fact an entire job in and of itself and its a tricky one because covering even a fraction of the use cases and complexity the software will actually face when deployed is really hard. So simply letting the LLMs brute force trial-and-error their code through a bunch of tests won’t actually get you good working code.
AlphaEvolve kind of did this, but it was testing very specific, well defined, well constrained algorithms that could have very specific evaluation written for them and it was using an evolutionary algorithm to guide the trial and error process. They don’t say exactly in their paper, but that probably meant generating code hundreds or thousands or even tens of thousands of times to generate relatively short sections of code.
I’ve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.