Readit News logoReadit News
ulrikhansen54 commented on SAM 2: Segment Anything in Images and Videos   github.com/facebookresear... · Posted by u/xenova
nravi20 · a year ago
Hi from the Segment Anything team! Today we’re releasing Segment Anything Model 2! It's the first unified model for real-time promptable object segmentation in images and videos! We're releasing the code, models, dataset, research paper and a demo! We're excited to see what everyone builds! https://ai.meta.com/blog/segment-anything-2/
ulrikhansen54 · a year ago
Awesome model - thank you! Are you guys planning to provide any guidance on fine-tuning?
ulrikhansen54 commented on SAM 2: Segment Anything in Images and Videos   github.com/facebookresear... · Posted by u/xenova
swyx · a year ago
i covered SAM 1 a year ago (https://news.ycombinator.com/item?id=35558522). notes from quick read of the SAM 2 paper https://ai.meta.com/research/publications/sam-2-segment-anyt...

1. SAM 2 was trained on 256 A100 GPUs for 108 hours (SAM1 was 68 hrs on same cluster). Taking the upper end $2 A100 cost off gpulist means SAM2 cost ~$50k to train - surprisingly cheap for adding video understanding?

2. new dataset: the new SA-V dataset is "only" 50k videos, with careful attention given to scene/object/geographical diversity incl that of annotators. I wonder if LAION or Datacomp (AFAICT the only other real players in the open image data space) can reach this standard..

3. bootstrapped annotation: similar to SAM1, a 3 phase approach where 16k initial annotations across 1.4k videos was then expanded to 63k+197k more with SAM 1+2 assistance, with annotation time accelerating dramatically (89% faster than SAM1 only) by the end

4. memory attention: SAM2 is a transformer with memory across frames! special "object pointer" tokens stored in a "memory bank" FIFO queue of recent and prompted frames. Has this been explored in language models? whoa?

(written up in https://x.com/swyx/status/1818074658299855262)

ulrikhansen54 · a year ago
A colleague of mine has written up a quick explainer on the key features (https://encord.com/blog/segment-anything-model-2-sam-2/). The memory attention module for keeping track of objects throughout a video is very clever - one of the trickiest problems to solve, alongside occlusion. We've spent so much time trying to fix these issues in our CV projects, now it looks like Meta has done the work for us :-)
ulrikhansen54 commented on Launch HN: Encord (YC W21) – Unit testing for computer vision models    · Posted by u/ulrikhansen54
emil_sorensen · 2 years ago
This looks promising - but how is this different from tools like Aquarium Learning or Voxel51?
ulrikhansen54 · 2 years ago
Those are both great tools. However, there are a number of differences, but the two most prominent are that: 1) Encord Active automatically analyses internal metrics to find the most relevant data and labels to focus on to improve model performance; and 2) it is optimised for the full 'continuous' training data workflow including the human-in-the-loop model validation and annotation.
ulrikhansen54 commented on Launch HN: Encord (YC W21) – Unit testing for computer vision models    · Posted by u/ulrikhansen54
esafak · 2 years ago
Can't you set usage-based pricing?

edit: It looks like you just launched appropriately early :) I assume you're aware of products like stigg.

ulrikhansen54 · 2 years ago
We run usage-based and tiered pricing, but we haven't gotten around to building out a self-serve "sign-up-with-credit-card" product yet. For all the advances in Stripe and automated billing, these things still take some time to implement for a short-staffed engineering team :-)
ulrikhansen54 commented on Launch HN: Encord (YC W21) – Unit testing for computer vision models    · Posted by u/ulrikhansen54
adrianh · 2 years ago
I had a look at your pricing page — https://encord.com/pricing/ — and was sad to see no pricing is actually communicated there.

What could I expect to pay for my company to use the Team plan?

ulrikhansen54 · 2 years ago
We base our pricing on your user and consumption scale and would be happy to discuss this with you directly. Please feel free to explore the OS version of Active at https://github.com/encord-team/encord-active. Note that some features, such as natural language search using GPU accelerated APIs, are not included in the OS version.
ulrikhansen54 commented on Launch HN: Encord (YC W21) – Unit testing for computer vision models    · Posted by u/ulrikhansen54
dontwearitout · 2 years ago
Does this include tools to evaluate for performance on out-of-distribution and adversarial images?
ulrikhansen54 · 2 years ago
Yes - the tool can definitely help with that. We combine the newest embedding models with various other heuristics to help identify performance outliers in your unseen data.
ulrikhansen54 commented on Adobe's buy of Figma is 'likely' bad for developers, rules UK regulator   theregister.com/2023/11/2... · Posted by u/seanhunter
giraffe_lady · 2 years ago
You don't have any knowledge, skills, habits, techniques or workflows specific to the tool you use? If true then your use is... anomalous and you shouldn't present yourself as a typical user of this tool.
ulrikhansen54 · 2 years ago
Some. But they are replicated in (most) competitive alternatives out there, including Adobe.
ulrikhansen54 commented on Adobe's buy of Figma is 'likely' bad for developers, rules UK regulator   theregister.com/2023/11/2... · Posted by u/seanhunter
stevage · 2 years ago
Well, duh?

How could a merger like this possibly be good for consumers? You take two companies actively competing against each other and ...stop one of them.

ulrikhansen54 · 2 years ago
It takes me 2 seconds to switch to a different design tool. Adobe has 30K employees and been around for 30 years. Figma is a small scrappy startup. It seems pretty preposterous to me that the deal will have any meaningful impact on consumers.

u/ulrikhansen54

KarmaCake day244August 15, 2020View Original