We've submitted numerous GH issues and even tried to chase the developer down on LinkedIn. But he treats the project like a fun, novelty "gift" to the community and doesn't respect the SLAs that any repo maintainer needs to adhere to for my org to put their free code in production.
Currently, we are using Unreal Engine 5 to do our hundreds of architectural physics simulations - the major issue is that UE5 is very slow on *the EC2 instance* (we only have one 2048 core EC2 instance shared between the entire office; we used to use Vercel and Cloudflare but we had to sell our homes to suddenly subscribe to Cloudflare Enterprise (the CF sales guy told us that we would not be allowed to run a CF Worker for more than 30 days without it, even though we had a CF worker run for 37 years, and many of our CF workers have been running before the creation of CF (nobody knows why)) and a giant spike in our Vercel Cuda Function Invocations (for GPGPU compute on the Edge, allowing architects to view the collapse of their buildings with only ~53 ms of latency (compared to ~53 ms without Next.js))). Ball seems much faster (it can run on a Macbook Air), potentially allowing us to save at least several tens of millions of dollars per year on AWS costs.
This isn't just a new big rocket. This is the most powerful rocket ever built, with the goal of launching it for less than the cheapest rockets cost. The current goal is to aim for $10 million within a few years, and then keep pushing it lower. For contrast, a Falcon 9 currently costs about $67 million to send 18 tons to orbit. Rocket Lab's Electron micro-rocket costs $7.5 million to send 0.3 tons to orbit. Starship can deliver 150 tons to orbit, a number that is planned to increase substantially.
The thing about space is that the potential is infinite, but it only becomes possible to start doing stuff once you get launch costs really low. Falcon 9 has brought launch costs down by orders of magnitude, but most people don't even realize this because unless you're a giant telecoms company or something, then $2000/kg doesn't sound that different than $50,000/kg --- wayyyyy too expensive for anything. But now imagine a world where you could launch things for $10/kg. Suddenly the entire universe opens up to expansion and exploitation, and life as we know it would basically change overnight.
It only costs $150 per kg in the near future to send objects into space with Starship; so I could, for example, send a Raspberry Pi (47 grams) into LEO for ~7 dollars (as long as I also had 149 tons of other objects from other people to send). A more useful use case would sending fully automated manufacturing facilities (probably either for semiconductors (https://www.nasa.gov/general/the-benefits-of-semiconductor-m...) or crystals (https://uofuhealth.utah.edu/newsroom/news/2017/07/proteinxl))
* While lmsys does hide the names of models until a person decides which model generated the best text, people can still figure out what language model generated a piece of text** (or have a good guess) without explicit knowledge, especially if that model is hyped up online as 'GPT-5;' even a subconscious "this text sounds like what I have seen 'GPT2-chatbot' generate online" may influence results inadvertently.
** ... though I will note that I just got a generation from 'gpt2-chatbot' that I thought was from Claude 3 (haiku/sonnet), and its competitor was LLaMa-3-70b (I thought it was 8b or Mixtral). I am obviously not good at LLM authorship attribution.
(I can't try right now because of API rate limits)
When asked about it in October last year, LMSYS replied [0] "It is an experiment we are running currently. More details will be revealed later"
One distinguishing feature of "deluxe-chat": although it gives high quality answers, it is very slow, so slow that the arena displays a warning whenever it is chosen as one of the competitors
Beam search or weird attention/non-transformer architecture?
This is assuming that lmsys' GPT-2 is retained GPT-4t or a new GPT-4.5/5 though; I doubt that (one obvious issue: why name it GPT-2 and not something like 'openhermes-llama-3-70b-oai-tokenizer-test' (for maximum discreetness) or even 'test language model (please ignore)' (which would work well for marketing); GPT-2 (as a name) doesn't really work well for marketing or privacy (at least compared to the other options)).
Lmsys has tested models with weird names for testing before: https://news.ycombinator.com/item?id=40205935