My Steam account is turning 20 today at 4:58pm (CET) and I cannot help but feel a bit old for the very first time. I still remember all my friends hating it so much in the beginning because it would slow our PCs down and cost us valuable FPS in Counter-Strike. It's amazing what Valve has achieved with Steam since then. Happy Birthday Steam!
Does anyone know the kind of actual infrastructure something like gpt4-32k actually run on?
I mean when I actually type something in the prompt, what actually happens behind the scenes?
Is the answer computed on a single NVidia GPU?
Or is it dedicated H/W not known to the general public?
How big is that GPU?
How much RAM does it have?
Is my conversation run by a single GPU instance that is dedicated to me or is that GPU shared by multiple users?
If the latter, how many queries per seconds can a single GPU handle?
Where is that GPU?
Does it run in an Azure data center?
Is the API usage cost actually reflective of the HW cost or is it heavily subsidized?
Is a single GPU RAM size the bottleneck for how large a model can be?
Is any of that info public ?