Readit News logoReadit News
scoots_k · 5 months ago
Moondream 2 has been very useful for me: I've been using it to automatically label object detection datasets for novel classes and distill an orders of magnitude smaller but similarly accurate CNN.

One oddity is that I haven't seen the claimed improvements beyond the 2025-01-09 tag - subsequent releases improve recall but degrade precision pretty significantly. It'd be amazing if object detection VLMs like this reported class confidences to better address this issue. That said, having a dedicated object detection API is very nice and absent from other models/wrappers AFAIK.

Looking forward to Moondream 3 post-inference optimizations. Congrats to the team. The founder Vik is a great follow on X if that's your thing.

radq · 5 months ago
Thanks! If you could shoot me a note at vik@m87.ai with any examples of the precision/recall issues you saw I'd appreciate it a ton.
buyucu · 5 months ago
are you planning to release a GGUF?
scoots_k · 5 months ago
Will do!
conwayanderson · 5 months ago
Also used it for auto-labeling - it's crazy good for that
ZeroCool2u · 5 months ago
Really impressive performance from the Moondream model, but looking at the results from the big 3 labs, it's absolutely wild how poorly Claude and OpenAI perform. Gemini isn't as good as Moondream, but it's clearly the only one that's even half way decent at these vision tasks. I didn't realize how big a performance gap there was.
Jackson__ · 5 months ago
Funnily enough, Gemini is also the only one able to read a D20. ChatGPT consistently gets it wrong, and Claude mostly argues it can't read the face of the die that's facing up because it's obstructed (it's not lol).
KronisLV · 5 months ago
I'm not sure why they haven't been acquired yet by any of the big ones, since clearly Moondream is pretty good! Definitely seems like something Anthropic/OpenAI/whoever would want to fold into their platforms and such. Everyone involved in creating it should probably be swimming in money and visual use cases for LLMs should become far less useless with the reach of the big orgs.
ekidd · 5 months ago
Gemini is really fantastic at anything that's OCR-adjacent, and it promptly falls over on most other image-related tasks.
stephenbuilds · 5 months ago
Using moondream2 at paper.design to describe user uploaded images (for automatic labels in the layer tree). It's incredible, super fast and accurate. Excited to try out 3 :)
sheepscreek · 5 months ago
Impressive stuff! Has anyone tried it for computer/browser control? How does it fare with graphs and charts?
radq · 5 months ago
The 'point' skill is trained on a ton of UI data; we've heard of a lot of people using it in combination with a bigger driver model for UI automation. We are also planning on post-training it to work end-to-end for this in an agentic setting before the final release -- this was one of the main reasons we increased the model's context length.

Re: chart understanding, there are a lot of different types of charts out there but it does fairly well! We posted benchmarks for ChartQA in the blog but it's on par with GPT5* and slightly better than Gemini 2.5 Flash.

* To be fair to GPT5, it's going to work well on many more types of charts/graphs than Moondream. To be fair to Moondream, GPT5 isn't really well suited to deploy in a lot of vision AI applications due to cost/latency.

bobdyl87 · 5 months ago
Im labeling a dataset with it. We’ll see how it turns out
bobdyl87 · 5 months ago
Pretty good so far. Have 100,000 detections
Onavo · 5 months ago
How does it perform against the new Qwen3-VL model?
simonw · 5 months ago
This looks amazing. I'm a big fan of Gemini for bounding box operations, the idea that a 9B model could outperform it is incredibly exciting!

I noticed that Moondream 2 was Apache 2 licensed but the 3 preview is currently BSL ("You can’t (without a deal): offer the model’s functionality to anyone outside your organization—e.g., an external API, or managed hosting for customers") - is that a permanent change to your licensing policies?

simonw · 5 months ago
I just noticed in https://huggingface.co/moondream/moondream3-preview/blob/mai... that the license is set to change to Apache 2 after two years.
bluelightning2k · 5 months ago
Spent 5 minutes trying to get basic pricing info for Moondream cloud. Seems it simply does not exist (or at least not until you've actually signed up?). There's 5,000 free requests but I need to sense-check the pricing as viable as step 0 of evaluating - long before hooking it up to an app.
civilchaos · 5 months ago
We are looking to launch our cloud very soon. We are still optimizing our inference to get you the best pricing we can offer. Follow @moondreamai on X if you want your ear to the ground for our launch!
aitchnyu · 5 months ago
Will you add this to OpenRouter too?
nicohayes · 5 months ago
The MoE architecture choice here is particularly interesting - the ability to keep only 2B parameters active while maintaining 8B model performance is a game-changer for edge deployment. I've been deploying vision models in production environments where latency is critical, and this sparse activation approach could solve the inference cost problem that's been limiting adoption of larger VLMs. The chart understanding capabilities mentioned look promising for automated document analysis workflows. Has anyone tested the model's consistency across different image qualities or lighting conditions? That's often where smaller models struggle compared to frontier ones.