Readit News logoReadit News
npunt commented on Auto-grading decade-old Hacker News discussions with hindsight   karpathy.bearblog.dev/aut... · Posted by u/__rito__
slg · 4 days ago
And this is a perfect example of how some people respond to LLMs, bending over backwards to justify the output like we are some kids around a Ouija board.

The LLM isn't misinterpreting the text, it's just representing people who misinterpreted the text isn't the defense you seem to think it is.

npunt · 4 days ago
And your response here is a perfect example of confidently jumping to conclusions on what someone's intent is... which is exactly what you're saying the LLM did to you.

I scoped my comment specifically around what a reasonable human answer would be if one were asked the particular question it was asked with the available information it had. That's all.

Btw I agree with your comment that it hallucinated/assumed your intent! Sorry I did not specify that. This was a bit of a 'play stupid games win stupid prizes' prompt by the OP. If one asks an imprecise question one should not expect a precise answer. The negative externality here is reader's takeaways are based on false precision. So is it the fault of the question asker, the readers, the tool, or some mix? The tool is the easiest to change, so probably deserves the most blame.

I think we'd both agree LLMs are notoriously overly-helpful and provide low confidence responses to things they should just not comment on. That to me is the underlying issue - at the very least they should respond like humans do not only in content but in confidence. It should have said it wasn't confident about its response to your post, and OP should have thus thrown its response out.

Rarely do we have perfect info, in regular communications we're always making assumptions which affect our confidence in our answers. The question is what's the confidence threshold we should use? This is the question to ask before the question of 'is it actually right?', which is also an important question to ask, but one I think they're a lot better at than the former.

Fwiw you can tell most LLMs to update its memory to always give you a confidence score 0.0-1.0. This helps tremendously, it's pretty darn accurate, it's something you can program thresholds around, and I think it should be built in to every LLM response.

The way I see it, LLMs have lots and lots of negative externalities that we shouldn't bring into this world (I'm particularly sensitive to the effects on creative industries), and I detest how they're being used so haphazardly, but they do have some uses we also shouldn't discount and figure out how to improve on. The question is where are we today in that process?

The framework I use to think about how LLMs are evolving is that of transitioning mediums. Like movies started as a copy/paste of stage plays before they settled into the medium and understand how to work along the grain of its strengths & weaknesses to create new conventions. Speech & text are now transitioning into LLMs. What is the grain we need to go along?

My best answer is the convention LLMs need to settle into is explicit confidence, and each question asked of them should first be a question of what the acceptable confidence threshold is for such a question. I think every question and domain will have different answers for that, and we should debate and discuss that alongside any particular answer.

npunt commented on Auto-grading decade-old Hacker News discussions with hindsight   karpathy.bearblog.dev/aut... · Posted by u/__rito__
slg · 6 days ago
This is a perfect example of the power and problems with LLMs.

I took the narcissistic approach of searching for myself. Here's a grade of one of my comments[1]:

>slg: B- (accurate characterization of PH’s “networking & facade” feel, but implicitly underestimates how long that model can persist)

And here's the actual comment I made[2]:

>And maybe it is the cynical contrarian in me, but I think the "real world" aspect of Product Hunt it what turned me off of the site before these issues even came to the forefront. It always seemed like an echo chamber were everyone was putting up a facade. Users seemed more concerned with the people behind products and networking with them than actually offering opinions of what was posted.

>I find the more internet-like communities more natural. Sure, the top comment on a Show HN is often a critique. However I find that more interesting than the usual "Wow, another great product from John Developer. Signing up now." or the "Wow, great product. Here is why you should use the competing product that I work on." that you usually see on Product Hunt.

I did not say nor imply anything about "how long that model can persist", I just said I personally don't like using the site. It's a total hallucination to claim I was implying doom for "that model" and you would only know that if you actually took the time to dig into the details of what was actually said, but the summary seems plausible enough that most people never would.

The LLM processed and analyzed a huge amount of data in a way that no human could, but the single in-depth look I took at that analysis was somewhere between misleading and flat out wrong. As I said, a perfect example of what LLMs do.

And yes, I do recognize the funny coincidence that I'm now doing the exact thing I described as the typical HN comment a decade ago. I guess there is a reason old me said "I find that more interesting".

[1] - https://karpathy.ai/hncapsule/2015-12-18/index.html#article-...

[2] - https://news.ycombinator.com/item?id=10761980

npunt · 5 days ago
I'm not so sure; that may not have been what you meant, but that doesn't mean it's not what others read into it. The broader context is HN is a startup forum and one of the most common discussion patterns is 'I don't like it' that is often a stand-in for 'I don't think it's viable as-is'. Startups are default dead, after all.

With that context, if someone were to read your comment and be asked 'does this person think the product's model is viable in the long run' I think a lot of people would respond 'no'.

npunt commented on Deprecate like you mean it   entropicthoughts.com/depr... · Posted by u/todsacerdoti
npunt · 5 days ago
Filed under: trying to solve another domain's problem with only the tools in your domain (this is almost always a bad idea)
npunt commented on Auto-grading decade-old Hacker News discussions with hindsight   karpathy.bearblog.dev/aut... · Posted by u/__rito__
npunt · 5 days ago
One of the few use cases for LLMs that I have high hopes for and feel is still under appreciated is grading qualitative things. LLMs are the first tech (afaik) that can do top-down analysis of phenomena in a manner similar to humans, which means a lot of important human use cases that are judgement-oriented can become more standardized, faster, and more readily available.

For instance, one of the unfortunate aspects of social media that has become so unsustainable and destructive to modern society is how it exposes us to so many more people and hot takes than we have ability to adequately judge. We're overwhelmed. This has led to conversation being dominated by really shitty takes and really shitty people, who rarely if ever suffer reputational consequence.

If we build our mediums of discourse with more reputational awareness using approaches like this, we can better explore the frontier of sustainable positive-sum conversation at scale.

Implementation-wise, the key question is how do we grade the grader and ensure it is predictable and accurate?

npunt commented on Frank Gehry has died   bbc.co.uk/news/articles/c... · Posted by u/ksajadi
npunt · 11 days ago
I grew up a few blocks from his funky Santa Monica house [1], passed by it all the time. When you’re a kid you typically see wild new things like that as just normal because you have no context for how unusual they are. His house defied that perspective; even as a kid you understand that being wrapped in oddly angled chain link fences and corrugated metal is just... different. It's an unanswered question, a loose thread, a thing you can't unknow.

I don't particularly like the house - it's meant to be challenging not beautiful - but with perspective I see now there aren't many creations out there that achieve existence in eternal confusion like it does for me. I see his other works like Bilbao [2] and Disney Hall as refinements on the concept with the added dimension of beauty. They're not quite as memorable, but I think do a great job exploring the frontier of beauty and befuddlement.

[1] https://en.wikipedia.org/wiki/Gehry_Residence

[2] especially the aerial perspective https://en.wikipedia.org/wiki/Guggenheim_Museum_Bilbao#/medi...

npunt commented on UniFi 5G   blog.ui.com/article/intro... · Posted by u/janandonly
drnick1 · 11 days ago
Why are people paying what seems obscene prices for UniFi stuff? You probably all have spare hardware lying around that can be repurposed as a router; it does not need to be modern. I use a Ryzen 5 as a general purpose home server/router/firewall running Linux and no ISP plastic box or expensive "prosumer" gear can't touch its performance. I can push 25Gbps through it (saturating my SFP28 LAN), or north of 4Gbps through Wireguard. For access points in a home setting, TP-Link boxes flashed with OpenWrt are also considerably better value and far more "free" (i.e., unclouded) than any UniFi stuff constantly phoning home for "updates."
npunt · 11 days ago
Perhaps you missed the product positioning: “Simple setup and clean design”

Not everyone wants to fix old hardware and configure linux on their weekends

npunt commented on Everyone in Seattle hates AI   jonready.com/blog/posts/e... · Posted by u/mips_avatar
cosmicgadget · 13 days ago
Ironic? The author is working on an AI project.
npunt · 13 days ago
The irony is that AI writing style is pretty off-putting, and the story itself was about people being put off by the author's AI project.
npunt commented on John Giannandrea to retire from Apple   apple.com/newsroom/2025/1... · Posted by u/robbiet480
shagie · 15 days ago
It was driven by privacy and on device compute.

Anything you ask an Android device to do, or an Alexa device goes to their clouds to be 100% processed there.

Apple tried to make a small and focused interface that could do a limited set of things on device without going to the cloud to do it.

This was built around the idea of "Intents" and it only did the standard intents... and app developers were supposed to register and link into them.

https://developer.apple.com/documentation/intents

Some of the things didn't really get fleshed out, some are "oh, that's something in there?" (Restaurant reservations? Ride Booking?) and feels more like the half baked mysql interfaces in php.

However, as part of privacy - you can create a note (and dictate it) without a data connection with Siri. Your "start workout" command doesn't leave your device.

Part of that is privacy. Part of that is that Apple was trying to minimize its cloud spend (on GCP or AWS) by keeping as much of that activity on device. It wasn't entirely on device, but a lot more of it is than what Android is... and Alexa is a speaker and microphone hooked up to AWS.

This was ok, kind of meh, but ok pre-ChatGPT. With ChatGPT the expectations changed and the architecture that Apple had was not something that could pivot to meeting those expectations.

https://en.wikipedia.org/wiki/Apple_Intelligence

> Apple first implemented artificial intelligence features in its products with the release of Siri in the iPhone 4S in 2011.

> ...

> The rapid development of generative artificial intelligence and the release of ChatGPT in late 2022 reportedly blindsided Apple executives and forced the company to refocus its efforts on AI.

ChatGPT was as much a blindside to Apple as the iPhone was to Blackberry.

npunt · 15 days ago
I think all of these are true:

1. Apple is big enough that it needs to take care of edge cases like offline & limited cell reception, which affect millions in any given moment.

2. Launching a major UI feature (Siri) that people will come to rely on requires offline operation for common operations like basic device operations and dictation. Major UI features shouldn't cease to function when they enter bad reception zones.

3. Apple builds devices with great CPUs, which allows them to pursue a strategy of using edge compute to reduce spend.

4. A consequence of building products with good offline support is they are more private.

5. Apple didn't even build a full set of intents for most of their apps, hence 'remind me at this location' doesn't even work. App developers haven't either, because ...

6. Siri (both the local version and remote service) isn't very good, and regularly misunderstands or fails at basic comprehension tasks that do not even require user data to be understood or relayed back to devices to execute.

I don't buy that privacy is somehow an impediment to #5 or #6. It's only an issue when user data is involved, and Apple has been investing in techs like differential privacy to get around these limitations to some extent. But that is further downstream from #5 and #6 though.

npunt commented on John Giannandrea to retire from Apple   apple.com/newsroom/2025/1... · Posted by u/robbiet480
bangonkeyboard · 15 days ago
> For CLIs - most reasonable commands either have a `-h`, `--help`, `-help`, `/?`, or what have you. And manpages exist. Hunt the verb isn't really a problem for CLIs.

"Hunt the verb" means that the user doesn't know which commands (verbs) exist. Which a neophyte at a blank console will not. This absolutely is a problem with CLIs.

npunt · 15 days ago
Discoverability is quite literally the textbook problem with CLIs, in that many textbooks on UI & human factors research over the last 50 years discuss the problem.
npunt commented on John Giannandrea to retire from Apple   apple.com/newsroom/2025/1... · Posted by u/robbiet480
kridsdale1 · 15 days ago
Hey, I made that!
npunt · 15 days ago
It's a great feature! I was demoing it to my parents over Thanksgiving and forgot about the lack of Siri support, and of course it failed. Parents were excited when I mentioned it but now won't be using it. Ah well.

u/npunt

KarmaCake day6612May 15, 2013
About
Product & startup creator

Currently: Design @ Margins.app

Previously: cofounder of Daylight Computer co, cofounder @ EdSurge (education technology news co), VP Product @ OneSignal (YC S11), and an assortment of others.

Always: interest in making a better world, computing history & alternative futures, and creating sublime experiences that get us in touch with our humanity.

https://nickpunt.com

View Original