It's not clear from the blog post, the git page, and most other places if this will run on, even in big-O:
* CPU
* 16GB GPU
* 240GB server (of the type most business can afford)
* Meta/Google/Open AI/Anthropic-style data center
It's seems you need lot's of ram and vram. Reading the issues on github[1], it does not seem many others have had success in using this effectively:
- someone with a 96 Gb VRAM RTX 6000 Pro had cuda oom issues
- someone somehow made it work on a RTX 4090 somehow, but RTF processing time was 12...
- someone with a RTX 5090 managed to use it, but with clips no longer than 20s
It seems utility of the model for hobbyist with consumer grade cards will be low.
[1]: https://github.com/facebookresearch/sam-audio/issues/24
At first, write clean code with functions and don’t obsess over call overhead. Once it works, profile, then optimize where it actually matters.
Premature optimization, etc.