hathawsh (u/hathawsh)

hathawsh commented on Google releases its new Google Sans Flex font as open source omgubuntu.co.uk/2025/11/g... · Posted by u/CharlesW

hathawsh · 5 days ago

> as well as an axis for rounded terminals (as in terminals in letters, not command-line apps).

Now I want to see a rounded terminal (as in command-line apps, not terminals in letters.) Would I type in a circle? Sounds cool.

hathawsh commented on Auto-grading decade-old Hacker News discussions with hindsight karpathy.bearblog.dev/aut... · Posted by u/__rito__

yunwal · 7 days ago

What is an intellectual task? Once again, there's tons of stuff LLMs won't be trained on in the next 3 years. So it would be trivial to just find one of those things and say voila! LLMs aren't better than me at that.

I'll make one prediction that I think will hold up. No LLM-based system will be able to take a generic ask like "hack the nytimes website and retrieve emails and password hashes of all user accounts" and do better than the best hackers and penetration testers in the world, despite having plenty of training data to go off of. It requires out-of-band thinking that they just don't possess.

hathawsh · 7 days ago

I'll take a stab at this: LLMs currently seem to be rather good at details, but they seem to struggle greatly with the overall picture, in every subject.

- If I want Claude Code to write some specific code, it often handles the task admirably, but if I'm not sure what should be written, consulting Claude takes a lot of time and doesn't yield much insight, where as 2 minutes with a human is 100x more valuable.

- I asked ChatGPT about some political event. It mirrored the mainstream press. After I reminded it of some obvious facts that revealed a mainstream bias, it agreed with me that its initial answer was wrong.

These experiences and others serve to remind me that current LLMs are mostly just advanced search engines. They work especially well on code because there is a lot of reasonably good code (and tutorials) out there to train on. LLMs are a lot less effective on intellectual tasks that humans haven't already written and published about.

hathawsh commented on Auto-grading decade-old Hacker News discussions with hindsight karpathy.bearblog.dev/aut... · Posted by u/__rito__

LeroyRaz · 7 days ago

I am surprised the author thought the project passed quality control. The LLM reviews seem mostly false.

Looking at the comment reviews on the actual website, the LLM seems to have mostly judged whether it agreed with the takes, not whether they came true, and it seems to have an incredibly poor grasp of it's actual task of accessing whether the comments were predictive or not.

The LLM's comment reviews are of often statements like "correctly characterized [program language] as [opinion]."

This dynamic means the website mostly grades people on having the most confirmist take (the take most likely to dominate the training data, and be selected for in the LLM RL tuning process of pleasing the average user).

hathawsh · 7 days ago

Are you sure? The third section of each review lists the “Most prescient” and “Most wrong” comments. That sounds exactly like what you're looking for. For example, on the "Kickstarter is Debt" article, here is the LLM's analysis of the most prescient comment. The analysis seems accurate and helpful to me.

https://karpathy.ai/hncapsule/2015-12-03/index.html#article-...

  phire

  > “Oculus might end up being the most successful product/company to be kickstarted… > Product wise, Pebble is the most successful so far… Right now they are up to major version 4 of their product. Long term, I don't think they will be more successful than Oculus.”

  With hindsight:

  Oculus became the backbone of Meta’s VR push, spawning the Rift/Quest series and a multi‑billion‑dollar strategic bet.
  Pebble, despite early success, was shut down and absorbed by Fitbit barely a year after this thread.

  That’s an excellent call on the relative trajectories of the two flagship Kickstarter hardware companies.

hathawsh commented on AI should only run as fast as we can catch up higashi.blog/2025/12/07/a... · Posted by u/yuedongze

blauditore · 9 days ago

All these engineers who claim to write most code through AI - I wonder what kind of codebase that is. I keep on trying, but it always ends up producing superficially okay-looking code, but getting nuances wrong. Also fails to fix them (just changes random stuff) if pointed to said nuances.

I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.

hathawsh · 9 days ago

I think your intuition matches mine. When I try to apply Claude Code to a large code base, it spends a long time looking through the code and then it suggests something incorrect or unhelpful. It's rarely worth the trouble.

When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.

AI's failings are always a little weird.

hathawsh commented on The RAM shortage comes for us all jeffgeerling.com/blog/202... · Posted by u/speckx

hathawsh · 13 days ago

The article suggests that because the power and cooling are customized, it would take a ton of effort to run the new AI servers in a home environment, but I'm skeptical of that. Home-level power and cooling are not difficult these days. I think when the next generation of AI hardware comes out (in 3-5 years), there will be a large supply of used AI hardware that we'll probably be able to repurpose. Maybe we'll sell them as parts. It won't be plug-and-play at first, but companies will spring up to figure it out.

If not, what would these AI companies do with the huge supply of hardware they're going to want to get rid of? I think a secondary market is sure to appear.

hathawsh commented on Let AI do the hard parts of your holiday shopping blog.google/products/shop... · Posted by u/ChrisArchitect

foobarian · a month ago

> handcrafted headphones

"I wound the coils myself!"

hathawsh · a month ago

Yeah, that was a little joke I slipped in, but OTOH, I've seen headphone kits that used 3D printed parts and it's not difficult to imagine someone replacing the 3D printed parts with handcrafted wood.

hathawsh commented on Let AI do the hard parts of your holiday shopping blog.google/products/shop... · Posted by u/ChrisArchitect

alsetmusic · a month ago

I already told my loved ones to stop getting me gifts for my bday / holidays a few years ago. I have everything I want that wouldn't be obscenely expensive and in poor taste to request. Whatever people got me ended up on a shelf or in a drawer and was just a waste (with a couple of rare exceptions when someone made / crafted me a gift, and then it's really wonderful).

I can't imagine how useless an unthinking AI would be at this when my own family and friends who, and this is important, _know me_, can't find anything to get me that doesn't land in the above categories. I wouldn't have expected gifts to be a source of AI resource waste, but I must not be very imaginative.

hathawsh · a month ago

While I agree with you in principle (people should do their own thinking if they want gifts to be genuine), I thought I would go ahead and see how well Gemini can advise someone on choosing a gift for you.

https://gemini.google.com/share/88b694a09a89

The advice seems very good. What do you think? A donation to EFF or an open source project, a rare book, or handcrafted headphones seem like a good start for someone who can't afford anything extravagant.

hathawsh commented on The last-ever penny will be minted today in Philadelphia cnn.com/2025/11/12/busine... · Posted by u/andrewl

taftster · a month ago

But I can't subdivide 400 in to as many ways as 360. Think about the pie industry. They could be put out of business!!

hathawsh · a month ago

I usually want to cut pies into 14 pieces. Some might want 11 or 13. (17 is just too many.) I demand that we implement a system where a circle is 2 * 3 * 4 * 5 * 7 * 3 * 11 * 13 = 360360 degrees, so that we can cut pies evenly at anywhere from 2 to 15 slices. If my baker cuts a slice at 25739 degrees, I want a refund! (I'll keep the pie, because the pie is obviously useless.)

(720720 might be OK too so we can cut 16 pieces, but honestly, if you're cutting 16 pieces, you're not going to measure. You're just going to divide pieces in half until you have 16. 360360 is the future.)

hathawsh commented on LLMs are steroids for your Dunning-Kruger bytesauna.com/post/dunnin... · Posted by u/gridentio

gowld · a month ago

I recently asked a leading GenAI chatbot to help me understand a certain physics concept. As I pressed it on the aspect I was confused about, the bot repeatedly explained, and in our discussion, consistently held firm that I was misunderstanding something, and made guesses about what I was misunderstanding. Eventually I realized and stated my mistake, and the chatbot confirmed and explained the difference between my wrong version and the truth. I looked at some sources and confirmed that the bot was right, and I had misremembered something.

I was quite impressed that it didn't "give in" and validate my wrong idea.

hathawsh · a month ago

I've seen similar results in physics. I suspect LLMs are capable of redirecting the user accurately when there have been long discussions on the web about that topic. When an LLM can pattern-match on whole discussions, it becomes a next-level search engine.

Next, I hope we can somehow get LLMs to distinguish between reliable and less-reliable results.

hathawsh commented on LLMs are steroids for your Dunning-Kruger bytesauna.com/post/dunnin... · Posted by u/gridentio

GMoromisato · a month ago

Speaking of uncertainty, I wish more people would accept their uncertainty with regards to the future of LLMs rather than dash off yet another cocksure article about how LLMs are {X}, and therefore {completely useless}|{world-changing}.

Quantity has a quality of its own. The first chess engine to beat Gary Kasparov wasn't fundamentally different than earlier ones--it just had a lot more compute power.

The original Google algorithm was trivial: rank web pages by incoming links--its superhuman power at giving us answers ("I'm feeling lucky") was/is entirely due to a massive trove of data.

And remember all the articles about how unreliable Wikipedia was? How can you trust something when anyone can edit a page? But again, the power of quantity--thousands or millions of eyeballs identifying errors--swamped any simple attacks.

Yes, LLMs are literally just matmul. How can anything useful, much less intelligent, emerge from multiplying numbers really fast? But then again, how can anything intelligent emerge from a wet mass of brain cells? After all, we're just meat. How can meat think?

hathawsh · a month ago

Most of HN has probably seen this gem about "thinking meat", but in case you haven't: https://www.mit.edu/people/dpolicar/writing/prose/text/think...