radq (u/radq) - Readit News

radq commented on Moondream 3 Preview: Frontier-level reasoning at a blazing speed moondream.ai/blog/moondre... · Posted by u/kristianp

sheepscreek · 6 months ago

Impressive stuff! Has anyone tried it for computer/browser control? How does it fare with graphs and charts?

radq · 6 months ago

The 'point' skill is trained on a ton of UI data; we've heard of a lot of people using it in combination with a bigger driver model for UI automation. We are also planning on post-training it to work end-to-end for this in an agentic setting before the final release -- this was one of the main reasons we increased the model's context length.

Re: chart understanding, there are a lot of different types of charts out there but it does fairly well! We posted benchmarks for ChartQA in the blog but it's on par with GPT5* and slightly better than Gemini 2.5 Flash.

* To be fair to GPT5, it's going to work well on many more types of charts/graphs than Moondream. To be fair to Moondream, GPT5 isn't really well suited to deploy in a lot of vision AI applications due to cost/latency.

radq commented on Moondream 3 Preview: Frontier-level reasoning at a blazing speed moondream.ai/blog/moondre... · Posted by u/kristianp

scoots_k · 6 months ago

Moondream 2 has been very useful for me: I've been using it to automatically label object detection datasets for novel classes and distill an orders of magnitude smaller but similarly accurate CNN.

One oddity is that I haven't seen the claimed improvements beyond the 2025-01-09 tag - subsequent releases improve recall but degrade precision pretty significantly. It'd be amazing if object detection VLMs like this reported class confidences to better address this issue. That said, having a dedicated object detection API is very nice and absent from other models/wrappers AFAIK.

Looking forward to Moondream 3 post-inference optimizations. Congrats to the team. The founder Vik is a great follow on X if that's your thing.

radq · 6 months ago

Thanks! If you could shoot me a note at vik@m87.ai with any examples of the precision/recall issues you saw I'd appreciate it a ton.

radq commented on Tokasaurus: An LLM inference engine for high-throughput workloads scalingintelligence.stanf... · Posted by u/rsehrlich

radq · 9 months ago

Cool project! The codebase is simple and well documented, a good starting point for anyone interested in how to implement a high-performance inference engine. The prefix sharing is very relevant for anyone running batch inference to generate RL rollouts.

radq commented on Moondream 0.5B: The Smallest Vision-Language Model moondream.ai/blog/introdu... · Posted by u/BUFU

radq · a year ago

Hello folks, I work on moondream. Posted a demo video on twitter for this release: https://x.com/vikhyatk/status/1864727630093934818

Happy to answer any questions!

radq commented on Jeff Dean responds to EDA industry about AlphaChip twitter.com/JeffDean/stat... · Posted by u/nsoonhui

bushbaba · a year ago

h100 GPU instances are multiple orders of magnitude more expensive.

radq · a year ago

Not true, H100s cost $2-3/GPU/hr on the open market.

radq commented on How Meta trains large language models at scale engineering.fb.com/2024/0... · Posted by u/mfiguiere

bluedino · 2 years ago

We have almost 400 H100's sitting idle. I wonder how many other companies are buying millions of dollars worth of these chips with the hopes of them being used, but aren't being utilized?

radq · 2 years ago

Have you considered sponsoring an open-source project? ;)

radq commented on Qwen1.5-Moe: Matching 7B Model Performance with 1/3 Activated Parameters qwenlm.github.io/blog/qwe... · Posted by u/GaggiX

radq · 2 years ago

1/3rd "activated parameters", while also requiring 2x the VRAM.

radq commented on New algorithm unlocks high-resolution insights for computer vision news.mit.edu/2024/featup-... · Posted by u/zerojames

radq · 2 years ago

The training technique used here (fitting something similar to a NeRF to different views of the same image) is pretty similar to this paper which uses a similar technique to denoise (instead of upscale) output features: https://arxiv.org/abs/2401.02957

Posted by u/radq 2 years ago

Show HN: Moondream, a small vision language model that runs on 8GB of RAM github.com/vikhyat/moondr...

radq commented on Tell HN: YC company Anima Health is spamming email addresses posted to HN · Posted by u/catharsisatlast

wantlotsofcurry · 2 years ago

I received the same email after the first time I posted to the monthly “Who wants to be hired?” thread. Gross, really.

radq · 2 years ago

I'm confused - you posted in the "who wants to be hired" thread, and then got an email from this company asking if you'd be interested?

u/radq

KarmaCake day508October 17, 2010

About

vik@moondream.ai

View Original