Readit News logoReadit News
radq commented on Moondream 3 Preview: Frontier-level reasoning at a blazing speed   moondream.ai/blog/moondre... · Posted by u/kristianp
sheepscreek · 6 months ago
Impressive stuff! Has anyone tried it for computer/browser control? How does it fare with graphs and charts?
radq · 6 months ago
The 'point' skill is trained on a ton of UI data; we've heard of a lot of people using it in combination with a bigger driver model for UI automation. We are also planning on post-training it to work end-to-end for this in an agentic setting before the final release -- this was one of the main reasons we increased the model's context length.

Re: chart understanding, there are a lot of different types of charts out there but it does fairly well! We posted benchmarks for ChartQA in the blog but it's on par with GPT5* and slightly better than Gemini 2.5 Flash.

* To be fair to GPT5, it's going to work well on many more types of charts/graphs than Moondream. To be fair to Moondream, GPT5 isn't really well suited to deploy in a lot of vision AI applications due to cost/latency.

radq commented on Moondream 3 Preview: Frontier-level reasoning at a blazing speed   moondream.ai/blog/moondre... · Posted by u/kristianp
scoots_k · 6 months ago
Moondream 2 has been very useful for me: I've been using it to automatically label object detection datasets for novel classes and distill an orders of magnitude smaller but similarly accurate CNN.

One oddity is that I haven't seen the claimed improvements beyond the 2025-01-09 tag - subsequent releases improve recall but degrade precision pretty significantly. It'd be amazing if object detection VLMs like this reported class confidences to better address this issue. That said, having a dedicated object detection API is very nice and absent from other models/wrappers AFAIK.

Looking forward to Moondream 3 post-inference optimizations. Congrats to the team. The founder Vik is a great follow on X if that's your thing.

radq · 6 months ago
Thanks! If you could shoot me a note at vik@m87.ai with any examples of the precision/recall issues you saw I'd appreciate it a ton.
radq commented on Tokasaurus: An LLM inference engine for high-throughput workloads   scalingintelligence.stanf... · Posted by u/rsehrlich
radq · 9 months ago
Cool project! The codebase is simple and well documented, a good starting point for anyone interested in how to implement a high-performance inference engine. The prefix sharing is very relevant for anyone running batch inference to generate RL rollouts.
radq commented on Moondream 0.5B: The Smallest Vision-Language Model   moondream.ai/blog/introdu... · Posted by u/BUFU
radq · a year ago
Hello folks, I work on moondream. Posted a demo video on twitter for this release: https://x.com/vikhyatk/status/1864727630093934818

Happy to answer any questions!

radq commented on Jeff Dean responds to EDA industry about AlphaChip   twitter.com/JeffDean/stat... · Posted by u/nsoonhui
bushbaba · a year ago
h100 GPU instances are multiple orders of magnitude more expensive.
radq · a year ago
Not true, H100s cost $2-3/GPU/hr on the open market.
radq commented on How Meta trains large language models at scale   engineering.fb.com/2024/0... · Posted by u/mfiguiere
bluedino · 2 years ago
We have almost 400 H100's sitting idle. I wonder how many other companies are buying millions of dollars worth of these chips with the hopes of them being used, but aren't being utilized?
radq · 2 years ago
Have you considered sponsoring an open-source project? ;)
radq commented on Qwen1.5-Moe: Matching 7B Model Performance with 1/3 Activated Parameters   qwenlm.github.io/blog/qwe... · Posted by u/GaggiX
radq · 2 years ago
1/3rd "activated parameters", while also requiring 2x the VRAM.
radq commented on New algorithm unlocks high-resolution insights for computer vision   news.mit.edu/2024/featup-... · Posted by u/zerojames
radq · 2 years ago
The training technique used here (fitting something similar to a NeRF to different views of the same image) is pretty similar to this paper which uses a similar technique to denoise (instead of upscale) output features: https://arxiv.org/abs/2401.02957
radq commented on Tell HN: YC company Anima Health is spamming email addresses posted to HN    · Posted by u/catharsisatlast
wantlotsofcurry · 2 years ago
I received the same email after the first time I posted to the monthly “Who wants to be hired?” thread. Gross, really.
radq · 2 years ago
I'm confused - you posted in the "who wants to be hired" thread, and then got an email from this company asking if you'd be interested?

u/radq

KarmaCake day508October 17, 2010
About
vik@moondream.ai
View Original