Better LLM response format – A simple trick reduces costs and response time

I've composed a post that could be of interest to those of you working with GPT (or any other LLM) and seeking JSON as an output. Here's a simple trick that can help reduce expenses and improve response times.

loueed · 2 years ago

Interesting post, I've not used YAML outputs as of yet. When using GPT3.5 for JSON, I found that requesting minified JSON reduces the token count by a significant amount. In the example you mention, the month object minified is 28 tokens vs 96 tokens formatted. It actually beats the 50 Tokens returned from YAML.

It seems like the main issue is whitespace and indentation which YAML requires unlike JSON.

livshitz · 2 years ago

Yes, minified JSON would be even less tokens than YAML. But: 1- LLMs tend to have very hard time to produce minified (compacted) JSON in the output, consistently. 2- As for compacted JSON input- Empirically it seems that LLMs can process it quite well for basic cognitive tasks (Information Retrieval, basic Q&A, etc), but when it comes to bit more sophisticated tasks it fails compared to exactly the same input, uncompressed. I've mentioned and provided examples in the comments of this article.

keskival · 2 years ago

I wonder how well LLMs understand YAML Schema format.

I have found providing JSON Schema to them to be an excellent way to reduce their improvisation in their outputs intended for machine consumption.

villgax · 2 years ago

Might make more sense for invoking 3rd party API but for self run LLMs TypeChat w/ JSON is just fine instead of adapting to YAML across your stack

You shouldn't adapt YAML across your stack. Instead of parsing JSON string (the output from the LLM) into an object, I'm suggesting here you parse the YAML string into an object. The article suggests that it'll be more beneficial for you to do this LLM -> YAML -> JSON in terms of time and costs.