Readit News logoReadit News
thorum commented on Evaluating LLMs for my personal use case   darkcoding.net/software/p... · Posted by u/goranmoomin
thorum · 7 hours ago
> Six of the eleven picked the same movie

This is surely the greatest weakness of current LLMs for any task needing a spark of creativity.

thorum commented on Building AI products in the probabilistic era   giansegato.com/essays/pro... · Posted by u/sdan
therobots927 · 3 days ago
“The rigidness and near-perfect reliability of computer software is the unusual thing in human history, an outlier we’ve gotten used to.”

Ordered approximately by recency:

Banking? Clocks? Roman aqueducts? Mayan calendars? The sun rising every day? Predictable rainy and dry season?

How is software the outlier here?

thorum · 3 days ago
My point was more “humans are used to tools that don’t always work and can be used in creative ways” than “no human invention has ever been rigid and reliable”.

People on HN regularly claim that LLMs are useless if they aren’t 100% accurate all the time. I don’t think this is true. We work around that kind of thing every day.

With your examples:

- Before computers, fraud and human error was common in the banking system. We designed a system that was resilient against this and mostly worked, most of the time, well enough for most purposes even though it was built on an imperfect foundation.

- Highly precise clocks are a recent invention. For regular people 200 years ago, one person’s clock would often be 5-10 minutes off from someone else’s. People managed to get things done anyway.

I’ll grant you that Roman aqueducts, seasons and the sun are much more reliable than computers (as are all the laws of nature).

thorum commented on Building AI products in the probabilistic era   giansegato.com/essays/pro... · Posted by u/sdan
thorum · 3 days ago
I like this framing, but I don’t think it’s entirely new to LLMs. Humans have been building flexible, multi-purpose tools and using them for things the original inventor or manufacturer didn’t think of since before the invention of the wheel. It’s in our DNA. Our brains have been shaped by a world where that is normal.

The rigidness and near-perfect reliability of computer software is the unusual thing in human history, an outlier we’ve gotten used to.

thorum commented on A short statistical reasoning test   emiruz.com/post/2025-08-1... · Posted by u/usgroup
thorum · 5 days ago
I don’t know enough about statistics to answer these with math, but I’ve been on quite a few buses and it’s common at some stops for bus arrivals to cluster around specific times. If you always leave after first you see, and most of your random observations are before the first bus, won’t you (almost) always miss the others?
thorum commented on Llama-Scan: Convert PDFs to Text W Local LLMs   github.com/ngafar/llama-s... · Posted by u/nawazgafar
thorum · 7 days ago
I’ve been trying to convert a dense 60 page paper document to Markdown today from photos taken on my iPhone. I know this is probably not the best way to do it but it’s still been surprising to find that even the latest cloud models are struggling to process many of the pages. Lots of hallucination and “I can’t see the text” (when the photo is perfectly clear). Lots of retrying different models, switching between LLMs and old fashioned OCR, reading and correcting mistakes myself. It’s still faster than doing the whole transcription manually but I thought the tech was further along.
thorum commented on Model intelligence is no longer the constraint for automation   latentintent.substack.com... · Posted by u/drivian
thorum · 9 days ago
This article is insightful, but I blinked when I saw the headline “Reducing the human bottleneck” used without any apparent irony.

At some point we should probably take a step back and ask “Why do we want to solve this problem?” Is a world where AI systems are highly intelligent tools, but humans are needed to manage the high level complexity of the real world… supposed to be a disappointing outcome?

thorum commented on My AI-driven identity crisis   dusty.phillips.codes/2025... · Posted by u/wonger_
thorum · 10 days ago
I think we’re going to have to set bigger goals for ourselves. We’re all still figuring out what that means.
thorum commented on The surprise deprecation of GPT-4o for ChatGPT consumers   simonwillison.net/2025/Au... · Posted by u/tosh
CodingJeebus · 16 days ago
> or trying prompt additions like “think harder” to increase the chance of being routed to it.

Sure, manually selecting model may not have been ideal. But manually prompting to get your model feels like an absurd hack

thorum · 16 days ago
We need a new set of UX principles for AI apps. If users need to access an AI feature multiple times a day it should be a button.
thorum commented on Cursed Knowledge   immich.app/cursed-knowled... · Posted by u/bqmjjx0kac
thorum · 16 days ago
> npm scripts make a http call to the npm registry each time they run, which means they are a terrible way to execute a health check.

Is this true? I couldn’t find another source discussing it. That would be insane behavior for a package manager.

thorum commented on GPT-5: Key characteristics, pricing and system card   simonwillison.net/2025/Au... · Posted by u/Philpax
falcor84 · 17 days ago
Oh wow, so essentially a full year of post-training and testing. Or was it ready and there was a sufficiently good business strategy decision to postpone the release?
thorum · 17 days ago
The Information’s report from earlier this month claimed that GPT-5 was only developed in the last 1-2 months, after some sort of breakthrough in training methodology.

> As recently as June, the technical problems meant none of OpenAI’s models under development seemed good enough to be labeled GPT-5, according to a person who has worked on it.

But it could be that this refers to post-training and the base model was developed earlier.

https://www.theinformation.com/articles/inside-openais-rocky...

https://archive.ph/d72B4

u/thorum

KarmaCake day2411December 30, 2012View Original