Meta Segment Anything Model 3

Released last week. Looks like all the weights are now out and published. Don’t sleep on the SAM 3D series — it’s seriously impressive. They have a human pose model which actually rigs and keeps multiple humans in a scene with objects, all from one 2D photo (!), and their straight object 3D model is by far the best I’ve played with - it got a really very good lamp with translucency and woven gems in usable shape in under 15 seconds.

Qwuke · 3 months ago

Between this and DINOv3, Meta is doing a lot for the SOTA even if Llama 4 came up short compared to the Chinese models.

nl · 3 months ago

https://ai.meta.com/blog/sam-3d/ for those interested.

Fraterkes · 3 months ago

Are those the actual wireframes they're showing in the demos on that page? As in, do the produced models have "normal" topology? Or are they still just kinda blobby with a ton of polygons

seanw265 · 3 months ago

I haven’t tried it myself, but if you’re asking specifically about the human models, the article says they’re not generating raw meshes from scratch. They extract the skeleton, shape, and pose from the input and feed that into their HMR system [0], which is a parametric human model with clean topology.

So the human results should have a clean mesh. But that’s separate from whatever pipeline they use for non-human objects.

[0]: https://github.com/facebookresearch/MHR

vessenes · 3 months ago

I’ve only used the playground. But I think they are actual meshes - they don’t have any of the weird splat noise at the edge of the objects, and they do not seem to show similar lighting artifacts to a typical splat rendering.

daemonologist · 3 months ago

For the objects I believe they're displaying Gaussian splats in the demo, but the model itself can also produce a proper mesh. The human poses are meshes (it's posing and adjusting a pre-defined parametric model).

visioninmyblood · 3 months ago

you can download them at https://github.com/facebookresearch/sam3. for 3d https://github.com/facebookresearch/sam-3d-objects

retinaros · 3 months ago

I looked quickly but it does not generate a 3d model file right?

Side question: what are the current top goto open models for image captioning and building image embeddings dbs, with somewhat reasonable hardware requirements?

daemonologist · 3 months ago

For pure image embedding, I find DINOv3 to be quite good. For multimodal embedding, maybe RzenEmbed. For captioning I would use a regular multimodal LLM, Qwen 3 or Gemma 3 or something, if your compute budget allows.

NitpickLawyer · 3 months ago

Try any of the qwen3-vl models. They have 8, 4 and 2B models in this family.

Glemkloksdjf · 3 months ago

I would suggest YOLO. Depending on your domain, you might also finetune these models. Its relativly easy as they are not big LLMs but either image classification or bounding boxes.

I would recommend bounding boxes.

jabron · 3 months ago

What do you mean "bounding boxes"? They were talking about captions and embeddings, so a vision language model is required.

smallerize · 3 months ago

Which YOLO?

trevorhlynn · 3 months ago

This was front page for a while last week

https://news.ycombinator.com/item?id=45982073

dang · 3 months ago

Thanks! Macroexpanded:

Meta Segment Anything Model 3 - https://news.ycombinator.com/item?id=45982073 - Nov 2025 (133 comments)

p.s. This was lobbed onto the frontpage by the second-chance pool (https://news.ycombinator.com/item?id=26998308) and I need to make sure we don't end up with duplicate threads that way.

stronglikedan · 3 months ago

what is old is new again

enoch2090 · 3 months ago

Surprisingly, SAM3 works bad on engineering drawings while SAM2 kinda works, and VLMs like Qwen3-VL works as well

zubiaur · 3 months ago

Had good luck with Gemini 2.5, SAM3 failed miserably with PIDs.

yeah I tried too. Im trying a fine tuning on PIDs.

Looking forward to your progress! Just checked the paper and it says the underlying backbone is still DETR. My guess would be that SAM3 uses more video frames during the training process and caused the dilution of sparse engineering-paper-like data.

the_duke · 3 months ago

phkahler · 3 months ago

Which (if any) of these models could run on a RaspberryPi for object recognition at several FPS?

aliljet · 3 months ago

I wonder how effective this is medical scenarios? Segmenting organs and tumors in cat scans or MRIs?

colkassad · 3 months ago

Been waiting days to get approval to download this from huggingface. What's up with that?

observationist · 3 months ago

Alternative downloads exist. You can find torrents, and match checksums against the HF downloads, but there are also mirrors and clones right there in HF which you can download without even having to log in.

Thanks, got it and it's working wonders for my use case.

knicholes · 3 months ago

I was approved within about 10 minutes for "Segment Anything 3"

tschellenbach · 3 months ago

same here, didn't get approval

shashanoid · 3 months ago

Miss the old segment anything page, used it a lot. This UI I found very complex to use

bradyriddle · 3 months ago

Same.

Checkout https://github.com/MiscellaneousStuff/meta-sam-demo

It's a rip of the previous sam playground. I use it for a bunch of things.

Sam 3 is incredible. I'm surprised it's not getting more attention.

> I'm surprised it's not getting more attention.

Remember, it's not the idea, it's the marketing!