Response Healing: Reduce JSON defects by 80%+

Very confused. When you enable structured output the response should adhere to the JSON schema EXACTLY, not best effort, by constraining the output via guided decoding. This is even documented in OpenRouter's structured output doc

> The model will respond with a JSON object that strictly follows your schema

Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.

Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.

WTF

osaariki · 3 months ago

You're exactly right. The llguidance library [1,2] seems to have emerged as the go-to solution for this by virtue of being >10X faster than its competition. It's work from some past colleagues of mine at Microsoft Research based on theory of (regex) derivatives, which we perviously used to ship a novel kind of regex engine for .NET. It's cool work and AFAIK should ensure full adherence to a JSON grammar.

llguidance is used in vLLM, SGLang, internally at OpenAI and elsewhere. At the same time, I also see a non-trivial JSON error rate from Gemini models in large scale synthetic generations, so perhaps Google hasn't seen the "llight" yet and are using something less principled.

1: https://guidance-ai.github.io/llguidance/llg-go-brrr 2: https://github.com/guidance-ai/llguidance

red2awn · 3 months ago

Cool stuff! I don't get how all the open source inference framework have this down but the big labs doesn't...

Gemini [0] is falsely advertising this:

> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.

[0]: https://ai.google.dev/gemini-api/docs/structured-output?exam...

>What about XML? The plugin can heal XML output as well - contact us if you’d like access.

Isn't this exactly how we got weird html parsing logic in the first place, with "autohealing" logic for mismatched closing tags or quotes?

AlexCoventry · 3 months ago

This is probably a bit different. An LLM outputs a token at a time ("autoregressively") by sampling from a per-position token probability distribution, which depends on all the prior context so far. While the post doesn't describe OpenRouter's approach, most structured LLM output works by putting a mask over that distribution, so that any token which would break the intended structure has probability zero and cannot be sampled. So for instance, in the broken example from the post,

    {"name": "Alice", "age": 30

the standard LLM output would have stopped there because the LLM output an end-of-sequence (EOS) token. But because that would lead to a syntax error in JSON, the EOS token would have probability zero, and it would be forced to either extend the number "30", or add more entries to the object, or end it with "}".

I haven't played much with structured output, but I imagine the biggest risk is that you may force the model to work with contexts outside its training data, leading it to produce garbage, though hopefully syntactically-correct garbage.

I don't understand, though, why the probability of incorrect JSON wouldn't go to 0, under this framework (unless you hit the max sequence length before the JSON ended.) The post implies that JSON errors still happen, so it's possible they're doing something else.

top1aibooster · 3 months ago

> Here's something most developers overlook: if an LLM has a 2% JSON defect rate, and Response Healing drops that to 1%, you haven't just made a 1% improvement. You've cut your defects, bugs, and support tickets in half.

If part of my system can't even manage to output JSON reliably, it needs way more "healing" than syntax munging. This comes across as naive.

Dylan16807 · 3 months ago

Plus, that claim isn't even true. A 1% and 2% JSON defect rate are going to annoy a similar amount of people into filing bugs and tickets.

fhcuvyxu · 3 months ago

1% fail rate on API requests is horrifyingly embarrassing.

0cf8612b2e1e · 3 months ago

Sounds like we are twice as close to AGI!

01HNNWZ0MV43FF · 3 months ago

"it's not just X, it's Y"

Don't you worry about Planet Express, let me worry about blank.

Spivak · 3 months ago

Model itself can't output JSON reliably. It's on you building a system around the model to make sure it either returns correct output or errors which is trivial to do.

arm32 · 3 months ago

But, but, you've just cut your defects, bugs, and support tickets in half!

Dead Comment

wat10000 · 3 months ago

I thought structured output was done by only allowing tokens that would produce valid output. For their example of a missing closing bracket, the end token wouldn't be allowed, and it would only accept tokens that contain a digit, comma, or closing bracket. I guess that must not be the case, though. Doing that seems like a better way to address this.

numlocked · 3 months ago

That is a way of doing that, but it's quite expensive computationally. There are some companies that can make it feasible [0], but it's often not a perfect process and different inference providers implement it different ways.

[0] https://dottxt.ai/

ViewTrick1002 · 3 months ago

I have used structured outputs both with OpenAI and the Gemini models. In the beginning they had some rough edges but lately it's been smooth sailing.

Seems like Openrouter also supports structured outputs.

https://openrouter.ai/docs/guides/features/structured-output...

xg15 · 3 months ago

Out of curiosity, why is it so expensive? Shouldn't constraining the possible result tokens make the inference less expensive? (because you have to calculate less logits and could occasionally even skip tokens entirely if there is only one valid option)

joshstrange · 3 months ago

I’d be (genuinely) interested to hear from people who think this will help. In my mind, if the JSON isn’t valid I wouldn’t trust a “healed” version of it to be correct either. I mean, I guess you just do schema validation on your end and so maybe fixing a missing comma/brace/etc is actually really helpful. I’ve not done JSON generation at scale to know.

gruez · 3 months ago

Deleted Comment

lijok · 3 months ago

One of the best shitposts I have ever seen, by far. Absurdism taken to its finest form.

culi · 3 months ago

I did some searching for an open-source version of this and found this pretty neat library for Elixir called json_remedy

https://github.com/nshkrdotcom/json_remedy