Just one question: does the power supply have a 220/240v option (I'm in Australia)?
Just one question: does the power supply have a 220/240v option (I'm in Australia)?
Another way to phrase this is LLM-as-compiler and Python (or whatever) as an intermediate compiler artefact.
Finally, a true 6th generation programming language!
I've considered building a toy of this with really aggressive modularisation of the output code (eg. python) and a query-based caching system so that each module of code output only changes when the relevant part of the prompt or upsteam modules change (the generated code would be committed to source control like a lockfile).
I think that (+ some sort of WASM encapsulated execution environment) would one of the best ways to write one off things like scripts which don't need to incrementally get better and more robust over time in the way that ordinary code does.
> because omniscient-yet-dim-witted models terminate at "superhumanly assistive"
It might be that with dim wits + enough brute force (knowledge, parallelism, trial-and-error, specialisation, speed) models could still substitute for humans and transform the economy in short order.
I've never seen this question quantified in a really compelling way, and while interesting, I'm not sure this PDF succeeds, at least not well-enough to silence dissent. I think AI maximalists will continue to think that the models are in fact getting less dim-witted, while the AI skeptics will continue to think these apparent gains are in fact entirely a biproduct of "increasing" "omniscience." The razor will have to be a lot sharper before people start moving between these groups.
But, anyway, it's still an important question to ask, because omniscient-yet-dim-witted models terminate at "superhumanly assistive" rather than "Artificial Superintelligence", which in turn economically means "another bite at the SaaS apple" instead of "phase shift in the economy." So I hope the authors will eventually succeed.
I'm bullish (and scared) about AI progress precisely because I think they've only gotten a little less dim-witted in the last few years, but their practical capabilities have improved a lot thanks to better knowledge, taste, context, tooling etc.
What scares me is that I think there's a reasoning/agency capabilities overhang. ie. we're only one or two breakthroughs away from something which is both kinda omniscient (where we are today), and able to out-think you very quickly (if only through dint of applying parallelism to actually competent outcome-modelling and strategic decision making).
That combination is terrifying. I don't think enough people have really imagined what it would mean for an AI to be able to out-strategise humans in the same way that they can now — say — out-poetry humans (by being both decent in terms of quality and super fast). It's like when you're speaking to someone way smarter than you and you realise that they're 6 steps ahead, and actively shaping your thought process to guide you where they want you to end up. At scale. For everything.
This exact thing (better reasoning + agency) is also the top priority for all of the frontier researchers right now (because it's super useful), so I think a breakthrough might not be far away.
Another way to phrase it: I think today's LLMs are about as good at snap judgements in most areas as the best humans (probably much better at everything that rhymes with inferring vibes from text), but they kinda suck at:
1. Reasoning/strategising step-by-step for very long periods
2. Snap judgements about reasoning or taking strategic actions (in the way that expert strategic humans don't actually need to think through their actions step-by-step very often - they've built intuition which gets them straight to the best answer 90% of the time)
Getting good at the long range thinking might require more substantial architectural changes (eg. some sort of separate 'system 2' reasoning architecture to complement the already pretty great 'system 1' transformer models we have). OTOH, it might just require better training data and algorithms so that the models develop good enough strategic taste and agentic intuitions to get to a near-optimal solution quickly before they fall off a long-range reasoning performance cliff.
Of course, maybe the problem is really hard and there's no easy breakthrough (or it requires 100,000x more computing power than we have access to right now). There's no certainty to be found, but a scary breakthrough definitely seems possible to me.
This really doesn't accord with my own experience. Using claude-code (esp. with opus 4) and codex (with o3) I've written lots of good Rust code. I've actually found Rust helps the AI-pair-programming experience because the agent gets such good, detailed feedback from the compiler that it can iterate very quickly and effectively.
Can it set up great architecture for a large, complex project from scratch? No, not yet. It can't do that in Ruby or Typescript either (though it might trick you by quickly getting something that kinda works in those languages). It think that will be a higher bar because of how Rust front-loads a lot of hard work, but I expect continuing improvement.
Architecting the original 100kloc program well requires skill, but that effort is heavily front loaded.
Human brains seem like an existence proof for what’s possible, but it would be surprising if humans also represent the farthest physical limits of what’s technologically possible without the constraints of biology (hip size, energy budget etc).
Poker tests intelligence. So what gives? One interesting thing is that for whatever reason, poker performance isn't used a benchmark in the LLM showdown between big tech companies.
The models have definitely improved in the past few years. I'm skeptical that there's been a "break-through", and I'm growing more skeptical of the exponential growth theory. It looks to me like the big tech companies are just throwing huge compute and engineering budgets at the existing transformer tech, to improve benchmarks one by one.
I'm sure if Google allocated 10 engineers a dozen million dollars to improve Gemini's poker performance, it would increase. The idea before AGI and the exponential growth hypothesis is that you don't have to do that because the AI gets smarter in a general sense all on it's own.
> improve benchmarks one by one
If you're right about that in the strong sense — that each task needs to be optimised in total isolation — then it would be a longer, slower road to a really powerful humanlike system.
What I think is really happening though that each specific task (eg. coding) is having large spillover effects on other areas (eg. helping them to be better at extended verbal reasoning even when not writing any code). The AI labs can't do everything at once, so they're focusing where:
- It's easy to generate more data and measure results (coding, maths etc.) - There's a relative lack of good data in the existing training corpus (eg. good agentic reasoning logic - the kinds of internal monologs that humans rarely write down) - Areas where it would be immediately useful for the models to get better in a targeted way (eg. agentic tool-use; developing great hypothesis generation instincts in scientific fields like algorithm design, drug discovery and ML research)
By the time those tasks are optimised, I suspect the spill over effects will be substantial and the models will generally be much more capable.
Beyond that, the labs are all pretty open about the fact that they want to use the resulting AI talents for coding, reasoning and research skills to accelerate their own research. If that works (definitely not obvious yet) then finding ways to train a much broader array of skills could be much faster because that process itself would be increasingly automated.
Even a simple prompt like this:
=
I have two potential solutions.
Solution A:
Solution B:
Which one is better and why?
=
Is biased. Some LLM tends to choose the first option and the other prefer the last one.
(Of course, humans suffer from the same kind of bias too: https://electionlab.mit.edu/research/ballot-order-effects)
You just want the signal from the object level question to drown out irrelevant bias (which plan was proposed first, which of the plan proposers are more attractive, which plan seems cooler etc.)