Yes, you can have AI tools vibe code up "new" 68k assembly for old machines, but you're never going to see it find genuinely new techniques for pushing the limits of the hardware until you give it access to actual hardware. The demoscene pushes the limits so hard that emulators have to be updated after demos are published. That makes it prohibitively expensive and difficult to employ AI to do this work in the manner you describe.
Don't mistake productivity for progress. There is joy in solving hard problems yourself, especially when you're the one who chose the limitations... And remember to sit back and enjoy yourself once in a while.
Speaking of, here's a demo you can sit back and enjoy: https://youtu.be/3aJzSySfCZM
Awesome demo! It's a little bit of middle age crisis :), but superbly done! Thank you.
Right now, LLMs feel like they’re at the same stage as raw FLOPs; impressive, but unwieldy. You can already see the beginnings of "systems thinking" in products like Claude Code, tool-augmented agents, and memory-augmented frameworks. They’re crude, but they point toward a future where orchestration matters as much as parameter count.
I don’t think the "bitter lesson" and the "engineering problem" thesis are mutually exclusive. The bitter lesson tells us that compute + general methods win out over handcrafted rules. The engineering thesis is about how to wrap those general methods in scaffolding that gives them persistence, reliability, and composability. Without that scaffolding, we’ll keep getting flashy demos that break when you push them past a few turns of reasoning.
So maybe the real path forward is not "bigger vs. smarter," but bigger + engineered smarter. Scaling gives you raw capability; engineering decides whether that capability can be used in a way that looks like general intelligence instead of memoryless autocomplete.