I predict we'll get a few research breakthroughs in the next few years that will make articles like this seem ridiculous.
You’re right in that it’s obviously not the only problem.
But without solving this seems like no matter how good the models get it’ll never be enough.
Or, yes, the biggest research breakthrough we need is reliable calibrated confidence. And that’ll allow existing models as they are to become spectacularly more useful.
1. The words "the only thing" massively underplays the difficulty of this problem. It's not a small thing.
2. One of the issues I've seen with a lot of chat LLMs is their willingness to correct themselves when asked - this might seem, on the surface, to be a positive (allowing a user to steer the AI toward a more accurate or appropriate solution), but in reality it simply plays into users' biases & makes it more likely that the user will accept & approve of incorrect responses from the AI. Often, rather than "correcting" itself it merely "teaches" the AI how to be confidently wrong in an amenable & subtle manner which the individual user finds easy to accept (or more difficult to spot).
If anything, unless/until we can solve the (insurmountable) problem of AI being wrong, AI should at least be trained to be confidently & stubbornly wrong (or right). This would also likely lead to better consistency in testing.
Humans have meta-cognition that helps them judge if they're doing a thing with lots of assumptions vs doing something that's blessed.
Humans decouple planning from execution right? Not fully but we choose when to separate it and when to not.
If we had enough data on here's a good plan given user context and here's a bad plan, it doesn't seem unreasonable to have a pretty reliable meta cognition capability on the goodness of a plan.