I've been wanted them to do this for questions like "what is your context length?" for ages - it frustrates me how badly ChatGPT handles questions about its own abilities, it feels like that would be worth them using some kind of special case or RAG mechanism to support.
Let alone that dynamically modifying the base system prompt would likely break their entire caching mechanism given that caching is based on longest prefix, and I can't imagine that the model's system prompt is somehow excluded from this.
If that's too scary, the failed tool call could trigger another AI to go draft up a PR with that proposed tool, since hey, it's cheap and might be useful.
Dynamic, on-the-fly generation & execution is definitely fascinating to watch in a sandbox, but is far to scary (from a compliance/security/sanity perspective) without spending a lot more time on guardrails.
We do however take note of hallucinated tool calls and have had it suggest an implementation we start with and have several such tools in production now.
It's also useful to spin up any completed agents and interrogate them about what tools they might have found useful during execution (or really any number of other post-process questionnaire you can think of).