Here's the thing, I get it, and it's easy to argue for this and difficult to argue against it. BUT
It's not intelligent. It just is not. It's tremendously useful and I'd forgive someone for thinking the intelligence is real, but it's not.
Perhaps it's just a poor choice of words. What a LOT of people really mean would go along the lines more like Synthetic Intelligence.
That is, however difficult it might be to define, REAL intelligence that was made, not born.
Transformer and Diffusion models aren't intelligent, they're just very well trained statistical models. We actually (metaphorically) have a million monkeys at a million typewriters for a million years creating Shakespeare.
My efforts manipulating LLMs into doing what I want is pretty darn convincing that I'm cajoling a statistical model and not interacting with an intelligence.
A lot of people won't be convinced that there's a difference, it's hard to do when I'm saying it might not be possible to have a definition of "intelligence" that is satisfactory and testable.
Ctrl-F 'code', 0 results
What is this comment about?
There are also plenty of reasons not to use proprietary US models for comparison: The major US models haven't been living up to their benchmarks; their releases rarely include training & architectural details; they're not terribly cost effective; they often fail to compare with non-US models; and the performance delta between model releases has plateaued.
A decent number of users in r/LocalLlama have reported that they've switched back from Opus 4.5 to Sonnet 4.5 because Opus' real world performance was worse. From my vantage point it seems like trust in OpenAI, Anthropic, and Google is waning and this lack of comparison is another symptom.
I agree it was arbitrary and unjust. I deserved to be a citizen.
(In more ordinary circumstances it's merely arbitrary and unjust.)
Systems can do worse things than crashing in response to unexpected states, but they can also do better to report them and terminate gracefully. Especially if the code runs on so many nodes, and the crash renders them unresponsive.
Blue/green deployment isn't always possible, but my imagination is a bit weak, and I cannot suggest a way to synchronously update so many nodes literally all over the internet. A blue/green deployment happens in large distributed systems willy-nilly. It's better when it happens in a controlled way, and the safety of a change that affects basically the entire fleet is tested under real load before applying it everywhere.
I do not even assume that any of Cloudflare's code was ever shipped with the "move fast, break things" mindset; I only posit that such a mindset is not optimal for a company in the Cloudflare's position. Their motto might rather be "move smooth, never break anything"; I suppose that most of their customers value their stability higher than their speed of releasing features, or whatnot.
Starting with questions is a very right way, I agree. My first question: why calling unwrap() might ever be a good idea in production code, and especially in some config-loading code, which, to my mind, should be resilient, and ready to handle variations in the config data gracefully? Certain mechanical patterns, like "don't hit your finger with a hammer", are best applied universally by default, with the rare exceptional cases carefully documented and explained, not the other way around.
* The deployment should have followed the blue/green pattern, limiting the blast radius of a bad change to a subset of nodes.
* In general, a company so much at the foundational level of internet connectivity should not follow the "move fast, break things" pattern. They did not have an overwhelming reason to hurry and take risks. This has burned a lot of trust, no matter the nature of the actual bug.
Actually learning from incidents and making systems more reliable requires curiosity and a willingness to start with questions rather than mechanically applying patterns. This is standard systems-safety stuff. The sort of false confidence involved in making prescriptions from afar suggests a mindset I don't want anywhere near the operation of anything critical.
https://maurycyz.com/misc/the_cost_of_trash/#:~:text=throw%2...