OpenAI structured outputs are pretty stable for me. Gemini sometimes responds with a completely different structure.
Gemini 3 flash with grounding sometimes returns json inside ```json...``` causing parsing errors.
https://github.com/josdejong/jsonrepair
might be useful ( i am not the author )
I love the twist: reversing the friendly levels gives you a classic parser, and it opens up crazy experiments like whitespace weakening. Have you tested it on non-arithmetic ops (logical/bitwise) or more complex expressions like ((()))?