Readit News logoReadit News
kalkin commented on OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI   simonwillison.net/2025/De... · Posted by u/simonw
ipaddr · 4 days ago
This could be a solved problem. Come up with problems not online and compare. Later use LLMs to sort through your problems and classify between easy-difficult
kalkin · 3 days ago
How do you imagine existing benchmarks were created?
kalkin commented on OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI   simonwillison.net/2025/De... · Posted by u/simonw
colechristensen · 4 days ago
>What is AGI? Artificial. General. Intelligence.

Here's the thing, I get it, and it's easy to argue for this and difficult to argue against it. BUT

It's not intelligent. It just is not. It's tremendously useful and I'd forgive someone for thinking the intelligence is real, but it's not.

Perhaps it's just a poor choice of words. What a LOT of people really mean would go along the lines more like Synthetic Intelligence.

That is, however difficult it might be to define, REAL intelligence that was made, not born.

Transformer and Diffusion models aren't intelligent, they're just very well trained statistical models. We actually (metaphorically) have a million monkeys at a million typewriters for a million years creating Shakespeare.

My efforts manipulating LLMs into doing what I want is pretty darn convincing that I'm cajoling a statistical model and not interacting with an intelligence.

A lot of people won't be convinced that there's a difference, it's hard to do when I'm saying it might not be possible to have a definition of "intelligence" that is satisfactory and testable.

kalkin · 3 days ago
If you can't define intelligence in a way that distinguishes AIs from people (and doesn't just bake that conclusion baldly into the definition), consider whether your insistence that only one is REAL is a conclusion from reasoning or something else.
kalkin commented on Horses: AI progress is steady. Human equivalence is sudden   andyljones.com/posts/hors... · Posted by u/pbui
underyx · 8 days ago
Ctrl-F 'lines', 0 results

Ctrl-F 'code', 0 results

What is this comment about?

kalkin · 8 days ago
Charitably I'm guessing it's supposed to be an allusion to the chart with cost per word? Which is measuring an input cost not an output value, so the criticism still doesn't quite make sense, but it's the best I can do...
kalkin commented on Influential study on glyphosate safety retracted 25 years after publication   lemonde.fr/en/environment... · Posted by u/isolli
kalkin · 11 days ago
Did they turn out to be right? Maybe, I'm not familiar with the research here, but no evidence for that has actually been posted. This study being untrustworthy doesn't make it prove its opposite instead.
kalkin commented on Mistral 3 family of models released   mistral.ai/news/mistral-3... · Posted by u/pember
popinman322 · 14 days ago
They're comparing against open weights models that are roughly a month away from the frontier. Likely there's an implicit open-weights political stance here.

There are also plenty of reasons not to use proprietary US models for comparison: The major US models haven't been living up to their benchmarks; their releases rarely include training & architectural details; they're not terribly cost effective; they often fail to compare with non-US models; and the performance delta between model releases has plateaued.

A decent number of users in r/LocalLlama have reported that they've switched back from Opus 4.5 to Sonnet 4.5 because Opus' real world performance was worse. From my vantage point it seems like trust in OpenAI, Anthropic, and Google is waning and this lack of comparison is another symptom.

kalkin · 14 days ago
Scale AI wrote a paper a year ago comparing various models performance on benchmarks to performance on similar but held-out questions. Generally the closed source models performed better, and Mistral came out looking pretty badly: https://arxiv.org/pdf/2405.00332
kalkin commented on Green card interviews end in handcuffs for spouses of U.S. citizens   nytimes.com/2025/11/26/us... · Posted by u/nxobject
refurb · 19 days ago
I wanted to be a citizen of Singapore and was never approved.

I agree it was arbitrary and unjust. I deserved to be a citizen.

kalkin · 19 days ago
Yes, this is by no means only a U.S. problem. Some countries are worse, even where birth rate trends seem like they should make it more obviously self-destructive. A tendency towards xenophobia seems to be an unfortunate human universal, although one we can sometimes overcome.
kalkin commented on Green card interviews end in handcuffs for spouses of U.S. citizens   nytimes.com/2025/11/26/us... · Posted by u/nxobject
JuniperMesos · 19 days ago
Only if you think it is crazy and cruel to disallow someone from another country to become a legal resident and then eventually a citizen of the US.
kalkin · 19 days ago
In many circumstances - including when that person is married to a US citizen, or when they'll likely be killed on return to their country of birth - it is indeed crazy and cruel.

(In more ordinary circumstances it's merely arbitrary and unjust.)

kalkin commented on Cloudflare outage should not have happened   ebellani.github.io/blog/2... · Posted by u/b-man
nine_k · 20 days ago
Indeed, I never worked at Cloudflare. Still I have some nebulous idea about Cloudflare, and especially their scale.

Systems can do worse things than crashing in response to unexpected states, but they can also do better to report them and terminate gracefully. Especially if the code runs on so many nodes, and the crash renders them unresponsive.

Blue/green deployment isn't always possible, but my imagination is a bit weak, and I cannot suggest a way to synchronously update so many nodes literally all over the internet. A blue/green deployment happens in large distributed systems willy-nilly. It's better when it happens in a controlled way, and the safety of a change that affects basically the entire fleet is tested under real load before applying it everywhere.

I do not even assume that any of Cloudflare's code was ever shipped with the "move fast, break things" mindset; I only posit that such a mindset is not optimal for a company in the Cloudflare's position. Their motto might rather be "move smooth, never break anything"; I suppose that most of their customers value their stability higher than their speed of releasing features, or whatnot.

Starting with questions is a very right way, I agree. My first question: why calling unwrap() might ever be a good idea in production code, and especially in some config-loading code, which, to my mind, should be resilient, and ready to handle variations in the config data gracefully? Certain mechanical patterns, like "don't hit your finger with a hammer", are best applied universally by default, with the rare exceptional cases carefully documented and explained, not the other way around.

kalkin · 20 days ago
I appreciate that this comment is much less prescriptive. I don't think I disagree with you about any general best practices here (although I do think unwrap can be fine when you can locally verify the error or nil case is unreachable but proving that to the compiler is impractical.)
kalkin commented on Cloudflare outage should not have happened   ebellani.github.io/blog/2... · Posted by u/b-man
nine_k · 20 days ago
* The unwrap() in production code should have never passed code review. Damn, it should have been flagged by a linter.

* The deployment should have followed the blue/green pattern, limiting the blast radius of a bad change to a subset of nodes.

* In general, a company so much at the foundational level of internet connectivity should not follow the "move fast, break things" pattern. They did not have an overwhelming reason to hurry and take risks. This has burned a lot of trust, no matter the nature of the actual bug.

kalkin · 20 days ago
Unless you work at Cloudflare it seems very unlikely that you have enough information about systems and tradeoffs there to make these flat assertions about what "should have" happened. Systems can do worse things than crashing in response to unexpected states. Blue/green deployment isn't always possible (eg due to constrained compute resources) or practical (perhaps requiring greatly increased complexity), and is by no means the only approach to reducing deploy risk. We don't know that any of the related code was shipped with a "move fast, break things" mindset; the most careful developers still write bugs.

Actually learning from incidents and making systems more reliable requires curiosity and a willingness to start with questions rather than mechanically applying patterns. This is standard systems-safety stuff. The sort of false confidence involved in making prescriptions from afar suggests a mindset I don't want anywhere near the operation of anything critical.

kalkin commented on Messing with scraper bots   herman.bearblog.dev/messi... · Posted by u/HermanMartinus
lavela · a month ago
"Gzip only provides a compression ratio of a little over 1000: If I want a file that expands to 100 GB, I’ve got to serve a 100 MB asset. Worse, when I tried it, the bots just shrugged it off, with some even coming back for more."

https://maurycyz.com/misc/the_cost_of_trash/#:~:text=throw%2...

kalkin · a month ago
Ah cool that site's robots.txt is still broken, just like it was when it first came up on HN...

u/kalkin

KarmaCake day913November 2, 2012View Original