I predict we'll get a few research breakthroughs in the next few years that will make articles like this seem ridiculous.
You’re right in that it’s obviously not the only problem.
But without solving this seems like no matter how good the models get it’ll never be enough.
Or, yes, the biggest research breakthrough we need is reliable calibrated confidence. And that’ll allow existing models as they are to become spectacularly more useful.
Is it to build a copilot for a data analyst or to get business insight without going through an analyst?
If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.
These don’t seem like text2sql problems:
> Why did we hit only 80% of our daily ecommmerce transaction yesterday?
> Why is customer acquisition cost trending up?
> Why was the campaign in NYC worse than the same in SF?
1. The words "the only thing" massively underplays the difficulty of this problem. It's not a small thing.
2. One of the issues I've seen with a lot of chat LLMs is their willingness to correct themselves when asked - this might seem, on the surface, to be a positive (allowing a user to steer the AI toward a more accurate or appropriate solution), but in reality it simply plays into users' biases & makes it more likely that the user will accept & approve of incorrect responses from the AI. Often, rather than "correcting" itself it merely "teaches" the AI how to be confidently wrong in an amenable & subtle manner which the individual user finds easy to accept (or more difficult to spot).
If anything, unless/until we can solve the (insurmountable) problem of AI being wrong, AI should at least be trained to be confidently & stubbornly wrong (or right). This would also likely lead to better consistency in testing.
Humans have meta-cognition that helps them judge if they're doing a thing with lots of assumptions vs doing something that's blessed.
Humans decouple planning from execution right? Not fully but we choose when to separate it and when to not.
If we had enough data on here's a good plan given user context and here's a bad plan, it doesn't seem unreasonable to have a pretty reliable meta cognition capability on the goodness of a plan.