no sense spending large amounts of compute on algorithms for new math unless you can prove it can crawl.
It's also a more natural question to ask, since building projections on top of frozen foundation model embeddings is both common in an absolute sense, and much more common, relatively, than building projections off of tiny frozen networks like a ResNet-50.
Diffusion requires more computation resources than autoregressive models, compute excess is proportional to the length of sequence. Time dilated RNNs and adaptive computation in image recognition hint us that we can compute more with same weights and achieve better results.
Which, I believe, also hint at the at least one flaw of the TS study - I did not see that they matched DLM and AR by compute, they matched them only by weights.
* create a basic text adventure (or MUD) with a very spartan api-like representation
* use an LLM to embellish the description served to the user etc. With recent history in context the LLM might even kinda reference things the user asked previously etc.
* have NPCs implemented as own LLMs that are trying to 'play the game'. These might be using the spartan API directly like they are agents.
Its a fun thought experiment!
(An aside: I found that the graphical text adventure that I made for Ludum Dare 23 is still online! Although it doesn't render quite right in modern browsers.. things shouldn't have broken! But anyway https://williame.github.io/ludum_dare_23_tiny_world/)
The challenge for me was consistency in translating free text from dialogs into classic, deterministic game state changes. But what's satisfying is that the conversations aren't just window dressing, they're part of the game mechanic.
How should one interpet the "prediction score"?
When used in applications (like this one), the user typically establishes a confidence threshold and then every detection above that threshold is treated as a positive detection, the rest are discarded. The choice can be arbitrary or (sorta) principled.
(Context: Working in applied AI R&D for 10 years, daily user of Claude for boilerplate coding stuff and as an HTML coding assistant)
Lots of "with some tweaks i got it to work" or "we're using an agent at my company", rarely details about what's working or why, or what these production-grade agents are doing.
Why don’t you have hunter drones targeting any potential drone coming in?
It’s funny that they haven’t already. I mean it’s about “national security.” This threat has been looming for 10 maybe 15 years now
Development in the space is happening at a breakneck pace. We're hiring pretty aggressively, if this sort of thing seems interesting, check it out!