Can you save on LLM tokens using images instead of text?

Readit News

Posted by u/lpellis 4 months ago

Can you save on LLM tokens using images instead of text?pagewatch.ai/blog/post/ll...

bikeshaving · 4 months ago

Does this mean we’ll finally get empirical proof for the aphorism “a picture is worth a thousand words”?

https://en.wikipedia.org/wiki/A_picture_is_worth_a_thousand_...

heltale · 4 months ago

I suppose it’s only worth 256 words at a time right now. ;)

https://arxiv.org/abs/2010.11929

estebarb · 4 months ago

The CALM paper https://shaochenze.github.io/blog/2025/CALM/ says it is possible to compress 4 tokens in a single embedding, so... image = 4×256=1024 words > 1000 words. QED

floodfx · 4 months ago

Why are completion tokens more with image prompts yet the text output was about the same?

cma · 4 months ago

Some multimodal models may have a hidden captioning step that may take completion tokens, others work on a fully native representation, and some do both I think.

Garlef · 4 months ago

"Thinking" Mode

nunodonato · 4 months ago

it doesn't say that anywhere.

Deleted Comment

ashed96 · 4 months ago

In my experience, LLMs tend to take noticeably longer to process images than text.

weird-eye-issue · 4 months ago

It has to get the image data first, basically just IO time before processing it

ashed96 · 4 months ago

IIRC there's pre-processing (embedding/tokenization?) before feeding images to LLMs?

Hit this issue optimizing LLM request times. Ending up lowering image resolution. Lost some accuracy but could bear that.

psadri · 4 months ago

I wonder if these stay in the prefix cache?