Readit News logoReadit News
lemming commented on The highest quality codebase   gricha.dev/blog/the-highe... · Posted by u/Gricha
postalcoder · 3 days ago
One of my favorite personal evals for llms is testing its stability as a reviewer.

The basic gist of it is to give the llm some code to review and have it assign a grade multiple times. How much variance is there in the grade?

Then, prompt the same llm to be a "critical" reviewer with the same code multiple times. How much does that average critical grade change?

A low variance of grades across many generations and a low delta between "review this code" and "review this code with a critical eye" is a major positive signal for quality.

I've found that gpt-5.1 produces remarkably stable evaluations whereas Claude is all over the place. Furthermore, Claude will completely [and comically] change the tenor of its evaluation when asked to be critical whereas gpt-5.1 is directionally the same while tightening the screws.

You could also interpret these results to be a proxy for obsequiousness.

Edit: One major part of the eval i left out is "can an llm converge on an 'A'?" Let's say the llm gives the code a 6/10 (or B-). When you implement its suggestions and then provide the improved code in a new context, does the grade go up? Furthermore, can it eventually give itself an A, and consistently?

It's honestly impressive how good, stable, and convergent gpt-5.1 is. Claude is not great. I have yet to test it on Gemini 3.

lemming · 3 days ago
I agree, I mostly use Claude for writing code, but I always get GPT5 to review it. Like you, I find it astonishingly consistent and useful, especially compared to Claude. I like to reset my context frequently, so I’ll often paste the problems from GPT into Claude, then get it to review those fixes (going around that loop a few times), then reset the context and get it to do a new full review. It’s very reassuring how consistent the results are.
lemming commented on Pebble Index 01 – External memory for your brain   repebble.com/blog/meet-pe... · Posted by u/freshrap6
kstrauser · 5 days ago
While I sympathize with the intent of the law, this is a great example of why it's dumb. There's no possible way you could make that ring, in a reasonably ring-sized form factor, with today's manufacturing processes in such a way that an end user could replace it.
lemming · 5 days ago
If this law pushes back against the idea that it's ok to make endless tech products which are essentially future rubbish as soon as you buy them, then I think that's a good thing. Perhaps products like this just shouldn't exist until we have better ways of dealing with the remains.
lemming commented on Kids who own smartphones before age 13 have worse mental health outcomes: Study   abcnews.go.com/GMA/Family... · Posted by u/donsupreme
senfiaj · 22 days ago
Kids can / often use other family members' smartphones / tablets (I assume it's the majority of cases). How can the law prevent this if parents do nothing about this?
lemming · 22 days ago
The same way that the law prevents kids drinking their parents’ alcohol - it doesn’t. But having it be illegal sends a signal, even though it’s possible to circumvent it, and also allows prosecution if warranted.
lemming commented on Prozac 'no better than placebo' for treating children with depression, experts   theguardian.com/society/2... · Posted by u/pseudolus
robertakarobin · 23 days ago
I was very young when my mom started Prozac but do remember how angry and sad she was before compared to after.

Years later there was a time when me and my sister noticed our mom was acting a bit strange -- more snappish and irritable than usual, and she even started dressing differently. Then at dinner she announced proudly that she had been off Prozac for a month. My sister and I looked at each other and at the same time went, "Ohhhh!" Mom was shocked that we'd noticed such a difference in her behavior and started taking the medication again.

I've been on the exact same dose as her for 15 years, and my 7-year-old son just started half that dose.

If I have a good day it's impossible to day whether that's due to Prozac. But since starting Prozac I have been much more likely to have good days than bad. So, since Prozac is cheap and I don't seem to suffer any side effects, I plan to keep taking it in perpetuity.

What I tell my kids is that getting depressed, feeling sad, feeling hopeless -- those are all normal feelings that everyone has from time to time. Pills can't or shouldn't keep you from feeling depressed if you have something to be depressed about. Pills are for people who feel depressed but don't have something to be depressed about -- they have food, shelter, friends, opportunities to contribute and be productive, nothing traumatic has happened, but they feel hopeless anyway -- and that's called Depression, which is different from "being depressed."

lemming · 23 days ago
I'm very sorry to hear your story, and I'm really glad the medication has worked well for you and your family. It's early days, but it seems to be working well for ours too.

I also really admire the way you're dealing patiently with everyone in this thread arguing in bad faith, you have a lot more tolerance than I do! Hopefully it's not getting to you. Best wishes.

lemming commented on Prozac 'no better than placebo' for treating children with depression, experts   theguardian.com/society/2... · Posted by u/pseudolus
lemming · 23 days ago
Do you have a plan to get her off, or is she on the maintenance drug for life?

It's too early to say. Obviously the idea is to get her off it if possible.

SSRIs never help because of boosting serotonin.

That's a hell of a claim, which could use some evidence.

lemming commented on Prozac 'no better than placebo' for treating children with depression, experts   theguardian.com/society/2... · Posted by u/pseudolus
lemming · 23 days ago
Our 11 year old daughter was seriously depressed recently. N=1, but fluoxetine was life changing (and potentially life saving) for her, at least.
lemming commented on What if you don't need MCP at all?   mariozechner.at/posts/202... · Posted by u/jdkee
lemming · a month ago
Mario has some fantastic content, and has really shaped how I think about my interface to coding tools. I use a modified version of his LLM-as-crappy-state-machine model (https://github.com/badlogic/claude-commands) for nearly all my coding work now. It seems pretty clear these days that progressive discovery is the way forward (e.g. skills), and using CLI tools rather than MCP really facilitates that. I've gone pretty far down the road of writing complex LLM tooling, and the more I do that the more the simplicity and composability is appealing. He has a coding agent designed along the same principles, which I'm planning to try out (https://github.com/badlogic/pi-mono/tree/main/packages/codin...).
lemming commented on A Guide for WireGuard VPN Setup with Pi-Hole Adblock and Unbound DNS   psyonik.tech/posts/a-guid... · Posted by u/pSYoniK
mtlynch · 2 months ago
Does using NextDNS mean that you both can see a list of all the websites anyone in your family visits?
lemming · 2 months ago
Only to the domain level, not individual websites.
lemming commented on Vibe engineering   simonwillison.net/2025/Oc... · Posted by u/janpio
simonw · 2 months ago
I'm really sorry to hear this, because part of my goal here is to help push back against the idea that "programming skills are useless now, anyone can get an LLM to write code for them".

I think existing software development skills get a whole lot more valuable with the addition of coding agents. You can take everything you've learned up to this point and accelerate the impact you can have with this new family of tools.

I said a version of this in the post:

> AI tools amplify existing expertise. The more skills and experience you have as a software engineer the faster and better the results you can get from working with LLMs and coding agents.

A brand new vibe coder may be able to get a cool UI out of ChatGPT, but they're not going to be able to rig up a set of automated tests with continuous integration and continuous deployment to a Kubernetes cluster somewhere. They're also not going to be able to direct three different agents at once in different areas of a large project that they've designed the architecture for.

lemming · 2 months ago
While this is true, I definitely find that the style of the work changes a lot. It becomes much more managerial, and less technical. I feel much more like a mix of project and people manager, but without the people. I feel like the jury is still out on whether I’m overall more productive, but I do feel like I have less fun.
lemming commented on ICE Wants to Build Out a 24/7 Social Media Surveillance Team   wired.com/story/ice-socia... · Posted by u/loteck
CamperBob2 · 2 months ago
Indeed. The timeline is probably going to play out like this:

2025: Get downvoted on HN for comparing Charlie Kirk to Horst Wessel

2026: Get upvoted for it

2027: Get banned for it

2028: No voting allowed, here or anywhere else

lemming · 2 months ago
The question is when in this timeline will you get arrested or at least detained for questioning for it? We’re already at the point where you might get fired or deported for it.

u/lemming

KarmaCake day5196July 12, 2009
About
You can get me on colin at colinfleming dot net
View Original