z7 (u/z7) - Readit News

z7 commented on Self-hosting a NAT Gateway awsistoohard.com/blog/sel... · Posted by u/veryrealsid

z7 · a month ago

"You only live once."

Why state this as absolute fact? Seems a bit lacking in epistemic humility.

z7 commented on Hi, it's me, Wikipedia, and I am ready for your apology mcsweeneys.net/articles/h... · Posted by u/imichael

babblingfish · 2 months ago

Especially relevant today with the release of grokipedia

z7 · 2 months ago

Here's the Grokipedia submission (currently censored / flagged):

https://news.ycombinator.com/item?id=45726459

z7 commented on It's insulting to read AI-generated blog posts blog.pabloecortez.com/its... · Posted by u/speckx

z7 · 2 months ago

Hypothetically, what if the AI-generated blog post were better than what the human author of the blog would have written?

z7 commented on The dawn of the post-literate society – and the end of civilisation jmarriott.substack.com/p/... · Posted by u/drankl

Hnrobert42 · 3 months ago

In high school debate, I found a killer piece of evidence to counter any given doomsday argument.

From Eric Zencey:

There is seduction in apocalyptic thinking. If one lives in the Last Days, one’s actions, one’s very life, take on historical meaning and no small measure of poignance.

z7 · 3 months ago

List of dates predicted for apocalyptic events:

https://en.wikipedia.org/wiki/List_of_dates_predicted_for_ap...

z7 commented on DeepMind and OpenAI win gold at ICPC codeforces.com/blog/entry... · Posted by u/notemap

z7 · 3 months ago

Current cope collection:

- It's not a fair match, these models have more compute and memory than humans

- Contestants weren't really elite, they're just college level programmers, not the world's best

- This doesn't matter for the real world, competitive programming is very different from regular software engineering

- It's marketing, they're just cranking up the compute to unrealistic levels to gain PR points

- It's brute force, not intelligence

z7 commented on An LLM is a lossy encyclopedia simonwillison.net/2025/Au... · Posted by u/tosh

z7 · 4 months ago

An encyclopaedia is a lossy representation of reality.

z7 commented on His psychosis was a mystery–until doctors learned about ChatGPT's health advice psypost.org/his-psychosis... · Posted by u/01-_-

moduspol · 4 months ago

I continue to be surprised that LLM providers haven't been legally cudgeled into neutering the models from ever giving anything that can be construed as medical advice.

I'm glad--I think LLMs are looking quite promising for medical use cases. I'm just genuinely surprised there's not been some big lawsuit yet over it providing some advice that leads to some negative outcome (whether due to hallucinations, the user leaving out key context, or something else).

z7 · 4 months ago

Meanwhile this new paper claims that GPT-5 surpasses medical professionals in medical reasoning:

"On MedXpertQA MM, GPT-5 improves reasoning and understanding scores by +29.62% and +36.18% over GPT-4o, respectively, and surpasses pre-licensed human experts by +24.23% in reasoning and +29.40% in understanding."

https://arxiv.org/abs/2508.08224

z7 commented on GPT-5 openai.com/gpt-5/... · Posted by u/rd

hk__2 · 5 months ago

What do you mean? A single data point cannot be exponential. What the blog post say is that the ability to solve tasks of all LLMs is exponential over time, and GPT-5 fits in that curve.

z7 · 5 months ago

Yes, but the jump in performance from o3 is well beyond marginal while also fitting an exponential trend, which undermines the parent's claim on two counts.

z7 commented on GPT-5 openai.com/gpt-5/... · Posted by u/rd

henriquegodoy · 5 months ago

That SWE-bench chart with the mismatched bars (52.8% somehow appearing larger than 69.1%) was emblematic of the entire presentation - rushed and underwhelming. It's the kind of error that would get flagged in any internal review, yet here it is in a billion-dollar product launch. Combined with the Bernoulli effect demo confidently explaining how airplane wings work incorrectly (the equal transit time fallacy that NASA explicitly debunks), it doesn't inspire confidence in either the model's capabilities or OpenAI's quality control.

The actual benchmark improvements are marginal at best - we're talking single-digit percentage gains over o3 on most metrics, which hardly justifies a major version bump. What we're seeing looks more like the plateau of an S-curve than a breakthrough. The pricing is competitive ($1.25/1M input tokens vs Claude's $15), but that's about optimization and economics, not the fundamental leap forward that "GPT-5" implies. Even their "unified system" turns out to be multiple models with a router, essentially admitting that the end-to-end training approach has hit diminishing returns.

The irony is that while OpenAI maintains their secretive culture (remember when they claimed o1 used tree search instead of RL?), their competitors are catching up or surpassing them. Claude has been consistently better for coding tasks, Gemini 2.5 Pro has more recent training data, and everyone seems to be converging on similar performance levels. This launch feels less like a victory lap and more like OpenAI trying to maintain relevance while the rest of the field has caught up. Looking forward to seeing what Gemini 3.0 brings to the table.

z7 · 5 months ago

>The actual benchmark improvements are marginal at best

GPT-5 demonstrates exponential growth in task completion times:

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

z7 commented on GPT-5 openai.com/gpt-5/... · Posted by u/rd

doctoboggan · 5 months ago

Watching the livestream now, the improvement over their current models on the benchmarks is very small. I know they seemed to be trying to temper our expectations leading up to this, but this is much less improvement than I was expecting

z7 · 5 months ago

GPT-5 is #1 on WebDev Arena with +75 pts over Gemini 2.5 Pro and +100 pts over Claude Opus 4:

https://lmarena.ai/leaderboard