matroid (u/matroid) - Readit News

matroid commented on Simple 3D Packing github.com/Vrroom/psackin... · Posted by u/matroid

avidiax · 2 months ago

Much too hard to find the original paper: https://dl.acm.org/doi/epdf/10.1145/3592126

One question I have, is when we say "interlocking-free", does this mean that the algorithm can still densely stack cups (with a draft angle), or is it instead guaranteeing that the convex hull of shapes are non-interfering?

matroid · 2 months ago

Thanks. I'll link it in the first line in the README. I think the interlocking-free part can pack cups like you suggest. They propose a flood fill algorithm which computes all the reachable places for the voxelized shape. It doesn't put assumptions on convexity. I think it would be a great example to try it out on though.

matroid commented on Simple 3D Packing github.com/Vrroom/psackin... · Posted by u/matroid

matroid · 2 months ago

A while back, I implemented a paper that had showed up on HN for a course project (Dense, Interlocking-Free and Scalable Spectral Packing of Generic 3D Objects).

Over the holidays, I cleaned up the implementation (with the help of Claude Code, although this is not an advertisement for it) and released it on GitHub.

If anyone needs fast 3D packing in python, do give this a shot. Hopefully I have attributed all the code/ideas I have used from elsewhere properly (if not, please feel free to let me know).

matroid commented on FaceLift [ICCV 2025] huggingface.co/spaces/wly... · Posted by u/matroid

matroid · 5 months ago

Hey everyone, I wanted to share my friend's work on Single Portrait Photograph to 3D Head Model. He has a Huggingface demo that you can play with!

matroid commented on Weak supervision to isolate sign language communicators in crowded news videos vrroom.github.io/blog/202... · Posted by u/matroid

zie · 2 years ago

1st: I sign ASL not ISL like the OP is talking about.

In the ASL world, most news translations into ASL are delayed or sped up from the person talking and/or the captions if they happen to also be available.

You are going to have sync problems.

Secondly, it's not just moving the hands, body movements, facial expressions, etc all count in ASL , I'm betting they count in ISL as well.

Thirdly the quality of interpretation can be really bad. Horrendous. it's not so common these days, but it was fairly common that speakers would hire an interpreter and mistakenly hire someone willing to just move their arms randomly. I had it happen once at a doctors office. The "interpreter" was just lost in space. The doctor and I started writing things down and the interpreter seemed a little embarrassed at least.

Sometimes they hire sign language students, you can imagine hiring a first year french student to interpret for you, it's no different really. Sometimes they mean well, sometimes they are just there for the paycheck.

I bet it's a lot worse with ISL, because it's still very new, most students are not taught in ISL, there are only about 300 registered interpreters for millions of deaf people in India. https://islrtc.nic.in/history-0

We are still very much struggling with vocal to English transcriptions using AI. Despite loads of work from lots of companies and researchers. They are getting better, and in ideal scenarios are actually quite useful. Unfortunately the world is far from ideal.

The other day on a meeting with 2 people using the same phone. The AI transcription was highly confused and it went very, very wrong.

I'm not trying to discourage you, and it's great to see people trying. I wish you lots of success, just know it's not an easy thing and I imagine lots of lifetimes of work will be needed to generate useful signed language to written language services that are on-par with the best of the voice to text systems we have today.

matroid · 2 years ago

Thanks Zie for the message. I'm sorry to hear about your "interpreter" encounter :(

I do think these problems are much, much worse for ISL as you rightly noted.

I think I should have been careful when I said "solve" in my post. But that really came from a place of optimism/excitement.

matroid commented on Weak supervision to isolate sign language communicators in crowded news videos vrroom.github.io/blog/202... · Posted by u/matroid

kobalsky · 2 years ago

> It's more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.

it seems to be more common to see sign language interpreters now. is it just virtue signaling to have that instead of just closed captions?

matroid · 2 years ago

Also, in India, many hearing-impaired people know only ISL.

matroid commented on Weak supervision to isolate sign language communicators in crowded news videos vrroom.github.io/blog/202... · Posted by u/matroid

voidingw · 2 years ago

The blog post references translating between English and Indian Sign Language (ISL). I interpreted that to mean translating between spoken English and ISL, not ASL and ISL.

Regardless, I’m curious how (dis)similar ISL is to ASL.

matroid · 2 years ago

That is correct. We want to translate between English and ISL. English, because it is by and large the language of the Web and I think we should try to connect ISL to it rather than Indian Languages.

From my understanding, they are quite dissimilar. A person who knows ISL will not understand ASL, for example.

matroid commented on Weak supervision to isolate sign language communicators in crowded news videos vrroom.github.io/blog/202... · Posted by u/matroid

akira2501 · 2 years ago

> I believe that we can solve continuous sign language translation convincingly

American Sign Language is not English, in fact, it's not even particularly close to English. Much of the language is conveyed with body movements outside of the hands and fingers, particularly with facial expressions and "named placeholders."

> All this is to say, that we need to build a 5000 hour scale dataset for Sign Language Translation and we are good to go. But where can we find this data? Luckily news broadcasters often include special news segments for the hearing-impaired.

You need _way_ more than just 5000 hours of video. People who are deaf of hard of hearing, in my experience, dislike the interpreters in news broadcasts. It's very difficult, as an interpreter, to provide _worthwhile_ translations of what is being spoken _as_ it is being spoken.

It's more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.

The other issue is most interpreters are hearing and so use the language slightly differently from actual deaf persons, and training on this on news topics will make it very weak when it comes to understanding and interpreting anything outside of this context. ASL has "dialects" and "slang."

Hearing people always presume this will be simple. They should really just take an ASL class and worth with deaf and hearing impaired people first.

matroid · 2 years ago

Thanks for the feedback. You raise great points and this was the reason why we wrote this post, so that we can hear from people where the actual problem lies.

On a related note, this sort of explains why our model is struggling to fit on 500 hours of our current dataset (even on the training set). Even so, the current state of automatic translation for Indian Sign Language is that, in-the-wild, even individual words cannot be detected very well. We hope that what we are building might at least improve the state-of-the-art there.

> It's more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.

Can you elaborate a bit more on this. Do you think if we make a system for bad/broken transliteration and funnel it through ChatGPT, it might give meaningful results? That is ChatGPT might be able to correct for errors as it is a strong language model.

matroid commented on Harnessing Weak Supervision to Isolate Sign Language in Crowded News Videos vrroom.github.io/blog/202... · Posted by u/matroid

matroid · 2 years ago

Hello everyone, we are trying to make a large dataset for Sign Language translation, inspired by BSL-1K [1]. As part of cleaning our collected videos, we use a nice technique for aggregating heuristic labels [2]. We thought it was interesting enough to share with people on here.

[1] https://www.robots.ox.ac.uk/~vgg/research/bsl1k/

[2] https://github.com/snorkel-team/snorkel