Readit News logoReadit News
steinvakt2 · a year ago
I wish these things were described more clearly. Is this single object tracking or multi object tracking? Just a week ago SAMURAI was posted here, which is kind of the same thing, promising SOTA tracking performance using SAM2. But it only allows single object tracking, which makes it useless for many medical imaging tasks.
notreally123123 · a year ago
If it uses SAM2, it is always most likely single object tracking. What prevents you from running multiple single object tracker in parallel? This would emulate a multi object tracker. If you want to be fancy, you add some logic to handle id switches etc.
_Wintermute · a year ago
The logic to detect and attempt to fix ID switches is unfortunately a huge part of multi-object tracking.
ninalanyon · a year ago
Was the abstract written by ChatGPT? It's an unreadable wall of text.
brunorsini · a year ago
Does anyone know of a method for plugging the output of models like this one with traditional video editing software like Adobe Premiere?
atoav · a year ago
A thing that "always works" is outputing the transparancy as a grayscale (black=transparent, white=opaque) alpha video. You can combine these in an after effects composition which you can load directly into Premiere: https://helpx.adobe.com/premiere-pro/using/compositing-alpha...

The output of that tool also looks suspiciously like a so-called cryptomatte which many 3D tools use to store masks for each of the objects/materials/etc, Blender can read and write those — although I am not sure whether this tools supports outputing those.

RoseyWasTaken · a year ago
Exactly this! I experimented with a depth estimation model. An image was added as an input, the output was a depth map in grayscale, which I later used in Photoshop as a mask for Lens-blur simulation.
Infiltrator · a year ago
Final Cut Pro 11 now has a “Magnetic Mask” tool which performs well. I don’t know if this uses the same sort of models, but is functionally what you’re looking for.
atoav · a year ago
What I'd love to see is how these tools perform with low depth of field shots, e.g. one actor in shot and one actor out of focus in front of them standing in front of a street with moving traffic.

This kind of "cinematic" shots is where automatic masking tools typically fall apart.

t55 · a year ago
https://arxiv.org/abs/2411.02844 this paper is for you
wis · a year ago
It was fun trying out the demo, with the "coffee kettle pouring" video it did really well segmenting the man's hand and arm and tracking it (segmenting them in every frame correctly), but with the "Find the ball cup game" video it lost track of the tracked cup in a strange way, it kept track of it correctly while it went behind other cups, but after it wasn't occluded anymore, it switched to an other cup.

It's still impressive to me how it twice kept track between occlusions, but strange how it lost track when it wasn't occluded.

https://i.imgur.com/hOSQBtw.mp4

fellowniusmonk · a year ago
Selected the left hand of the guy (hand on the right from our perspective) dribbling the soccer ball and it ends up highlighting both arms after a mid-point frame where the left arm is only partially occluded by his body. Very interesting that it will go from tracking a single discrete object to multiple discrete objects.
datadrivenangel · a year ago
"On mobile devices such as iPhone 15 Pro Max, our EfficientTAMs can run at ~10 FPS for performing video object segmentation with reasonable quality"

This is pretty impressive! Lowering the compute requirements will allow more applications to be feasible.

thot_experiment · a year ago
Interesting, I saw this: https://arxiv.org/pdf/2411.11922 on here a few days back but I haven't actually read either paper, anyone who's looked at both care to give us a TL;DR?