What you get is a 3D room based on the prompt/image. It rewrites your prompt to a specific format. Overall the rooms tend to be detailed and imaginative.
Then you can fly around the room like in Minecraft creative mode. Really looking forward to more editing features/infill to augment this.
It is the most impressed I've been with an AI experience since the first time I saw a model one-shot material code.
Sure, its an early product. The visual output reminds me a lot of early SDXL. But just look at what's happened to video in the last year and image in the last three. The same thing is going to happen here, and fast, and I see the vision for generative worlds for everything from gaming/media to education to RL/simulation.
Baseten: 592.6 tps Groq: 784.6 tps Cerebras: 4,245 tps
still impressive work
That said, we are serving the model at its full 131K context window, and they are serving 33K max, which could matter for some edge case prompts.
Additionally, NVIDIA hardware is much more widely available if you are scaling a high-traffic application.
Do you guys know a website that clearly shows which OS LLM models run on / fit into a specific GPU(setup)?
The best heuristic i could find for the necessary VRAM is Number of Parameters × (Precision / 8) × 1.2 from here [0].
[0] https://medium.com/@lmpo/a-guide-to-estimating-vram-for-llms...
Your equation is roughly correct, but I tend to multiply by a factor of 2 not 1.2 to allow for highly concurrent traffic.