Because when the AI isn't cutting it, you always have the option to pull the plug and just do it manually. So the downside is bounded. In that way it's similar the Mitch Hedberg joke: "I like an escalator, because an escalator can never break. It can only become stairs."
The absolute worse-case scenario is situations where you think the AI is going to figure it out, so keep prompting it, far past the time when you should've changed your approach or gfiven up and done it manually.
You could have a codebase subtly broken on so many levels that you cannot fix it without starting from scratch - losing months.
You could slowly lose your ability to think and judge.
While they are generating tokens they have a state, and that state is recursively fed back through the network, and what is being fed back operates not just at the level of snippets of text but also of semantic concepts. So while it occurs in brief flashes I would argue they have mental state and they have thoughts. If we built an LLM that was generating tokens non-stop and could have user input mixed into the network input, it would not be a dramatic departure of today’s architecture.
It also clearly has goals, expressed in the RLHF tuning and the prompt. I call those goals because they directly determine its output, and I don’t know what a goal is other than the driving force behind a mind’s outputs. Base model training teaches it patterns, finetuning and prompt teaches it how to apply those patterns and gives it goals.
I don’t know what it would mean for a piece of software to have feelings or concerns or emotions, so I cannot say what the essential quality is that LLMs miss for that. Consider this thought exercise: if we were to ever do an upload of a human mind, and it was executing on silicon, would they not be experiencing feelings because their thoughts are provably a deterministic calculation?
I don’t believe in souls, or at the very least I think they are a tall claim with insufficient evidence. In my view, neurons in the human brain are ultimately very simple deterministic calculating machines, and yet the full richness of human thought is generated from them because of chaotic complexity. For me, all human thought is pattern matching. The argument that LLMs cannot be minds because they only do pattern matching … I don’t know what to make of that. But then I also don’t know what to make of free will, so really what do I know?
Could it understand the solution is correct by itself (one-shot)? Or did it have just great math intuition and knowledge? How the solutions were validated if it was 10-100 shot?
Deleted Comment
This is visible under extreme time pressure of producing a working game in 72 hours (our team scores consistenly top 100 in Ludum Dare which is a somewhat high standard).
We use a popular Unity game engine all LLMs have wealth of experience (as in game development in general), but the output is 80% so strangely "almost correct but not usable" that I cannot take the luxury of letting it figure it out, and use it as fancy autocomplete. And I also still check docs and Stackoverflow-style forums a lot, because of stuff it plainly mades up.
One of the reasons is maybe our game mechanics often is a bit off the beaten road, though the last game we made was literally a platformer with rope physics (LLM could not produce a good idea how to make stable and simple rope physics under our constraints codeable in 3 hours time).
But they definitely could and were [0]. You just employ multiple, and cross check - with the ability of every single one to also double check and correct errors.
LLMs cannot double check, and multiples won't really help (I suspect ultimately for the same reason - exponential multiplication of errors [1])
Not to mention the current token cost.