That said, I think we’re still in the “GPT-3.5” phase of image editing: amazing compared to what came before, but still tripping over repeated patterns (keyboards, clocks, Go boards, hands) and sometimes refusing edits due to safety policies. The gap between hype demos and reproducible results is also very real; I’ve seen outputs veer from flawless to poor with just a tiny prompt tweak.
Reference: https://arxiv.org/abs/2507.15855
Alternative: If Gemini Deep Think or GPT5-Pro people are listening, I think they should give free access to their models with potential scaffolding (ie. agentic workflow) to say some ~100 researchers to see if any of them can prove new math with their technology.
Reducing costs of reasoning is a huge ongoing challenge in LLMs. We're spending so much energy and compute resources today on reasoning that today's consumption rates were unexpected (to me) a short 1 yr ago. We're literally burning forests, the atmosphere and making electricity expensive for everyone.
DeepThink v3.1 made a significant leap in this direction recently -- significantly shorter thinking tokens at the same quality. GPT5's router was also one (important) attempt to reduce reasoning costs and make o3-quality available in the free tier without breaking the bank. This is also why Claude 4 is winning the coding wars against its reasoning peers -- it provides great quality without all the added reasoning tokens.
Getting inspiration from Alpha-go and MCMC literature -- applying tree weighting, prioritization and pruning feels extremely appropriate. (To improve the quality of Deep Think -- offered by Gemini & GPT5 Pro today)
So, yes, more of this please. Totally the right direction.
That’s why we’re not suddenly drowning in brilliant Steam releases post-LLMs. The tech has lowered one wall, but the taller walls remain. It’s like the rise of Unity in the 2010s: the engine democratized making games, but we didn’t see a proportional explosion of good game, just more attempts. LLMs are doing the same thing for code, and image models are starting to do it for art, but neither can tell you if your game is actually fun.
The interesting question to me is: what happens when AI can not only implement but also playtest -- running thousands of iterations of your loop, surfacing which mechanics keep simulated players engaged? That’s when we start moving beyond "AI as productivity hack" into "AI as collaborator in design." We’re not there yet, but this article feels like an early data point along that trajectory.
Right now, LLMs feel like they’re at the same stage as raw FLOPs; impressive, but unwieldy. You can already see the beginnings of "systems thinking" in products like Claude Code, tool-augmented agents, and memory-augmented frameworks. They’re crude, but they point toward a future where orchestration matters as much as parameter count.
I don’t think the "bitter lesson" and the "engineering problem" thesis are mutually exclusive. The bitter lesson tells us that compute + general methods win out over handcrafted rules. The engineering thesis is about how to wrap those general methods in scaffolding that gives them persistence, reliability, and composability. Without that scaffolding, we’ll keep getting flashy demos that break when you push them past a few turns of reasoning.
So maybe the real path forward is not "bigger vs. smarter," but bigger + engineered smarter. Scaling gives you raw capability; engineering decides whether that capability can be used in a way that looks like general intelligence instead of memoryless autocomplete.
Yes, you can have AI tools vibe code up "new" 68k assembly for old machines, but you're never going to see it find genuinely new techniques for pushing the limits of the hardware until you give it access to actual hardware. The demoscene pushes the limits so hard that emulators have to be updated after demos are published. That makes it prohibitively expensive and difficult to employ AI to do this work in the manner you describe.
Don't mistake productivity for progress. There is joy in solving hard problems yourself, especially when you're the one who chose the limitations... And remember to sit back and enjoy yourself once in a while.
Speaking of, here's a demo you can sit back and enjoy: https://youtu.be/3aJzSySfCZM
Awesome demo! It's a little bit of middle age crisis :), but superbly done! Thank you.
Sounds like you haven't been in touch with the Amiga scene in quite a while, if you think the above is something new. Perhaps Amiga / retro museums haven't been set up in your location, but there are heaps of them in Europe, for example. Youtube videos are a dime a dozen, just search 'amiga' on youtube and you will find literally hundreds of channels dedicated to the Amiga and/or Commodore in general. I subscribe to many of them already, and they all provide excellent in depth content for the Amiga, from hardware, to software, to games, to demos.
> AI coding might unlock mass creation of new software, games, demos, music etc. What was once conceived impossible will be very possible and likely abundant soon
Why would game writing / music creation / demos / software be "once conceived impossible"? Kids were doing the very thing in their bedrooms in the 80s and 90s, without AI. What would AI bring to the table nowadays that couldn't be done in the 80s/90s when the Amiga was popular?
People developing for the Amiga were putting their heart and soul into their creations. AI can't replicate that, and it definitely can't improve it, in any sense of the word.
I'm well aware of what's available out there as online content (it's no farther than a Google or youtube search).
Do you think what's out there as online content is what's truly possible if we had a million more Amiga enthusiasts?
That's my vision of what's to come in, say, 10-20 yrs. Imagine every Amiga game played and recorded by many (AI) users from start to finish. Every tactic explored, and cool strategies figured out. I for one would watch this.
Imagine vibe coding becoming more and more possible with 68k assembly. And having 1000x Amiga (AI) developers producing cool demo, intro and game material. New material. Novel and cutting edge material. At massive scale.
I believe this is the future we're headed. I for one am very excited about it.
----------
Re: A physical museum.
No, an Amiga or Commodore focus cannot be found anywhere in Silicon Valley or in United States. Even Computer History Museum (CHM) in Silicon Valley has very little Commodore content.
I live <1 mile away from the original Amiga offices in Los Gatos. It's a bit of shame that there's so little Amiga or Commodore in CHM.