Readit News logoReadit News
stephendause commented on Two kinds of vibe coding   davidbau.com/archives/202... · Posted by u/jxmorris12
Dr_Birdbrain · a day ago
I’m unclear what has been gained here.

- Is the work easier to do? I feel like the work is harder.

- Is the work faster? It sounds like it’s not faster.

- Is the resulting code more reliable? This seems plausible given the extensive testing, but it’s unclear if that testing is actually making the code more reliable than human-written code, or simply ruling out bugs an LLM makes but a human would never make.

I feel like this does not look like a viable path forward. I’m not saying LLMs can’t be used for coding, but I suspect that either they will get better, to the point that this extensive harness is unnecessary, or they will not be commonly used in this way.

stephendause · 19 hours ago
> - Is the work faster? It sounds like it’s not faster.

The author didn't discuss the speed of the work very much. It is certainly true that LLMs can write code faster than humans, and sometimes that works well. What would be nice is an analysis of the productivity gains from LLM-assisted coding in terms of how long it took to do an entire project, start to finish.

stephendause commented on It's insulting to read AI-generated blog posts   blog.pabloecortez.com/its... · Posted by u/speckx
ManuelKiessling · 2 months ago
Why have the LLMs „learned“ to write PRs (and other stuff) this way? This style was definitely not mainstream on Github (or Reddit) pre-LLMs, was it?

It’s strange how AI style is so easy to spot. If LLMs just follow the style that they encountered most frequently during training, wouldn’t that mean that their style would be especially hard to spot?

stephendause · 2 months ago
This is total speculation, but my guess is that human reviewers of AI-written text (whether code or natural language) are more likely to think that the text with emoji check marks, or dart-targets, or whatever, are correct. (My understanding is that many of these models are fine-tuned using humans who manually review their outputs.) In other words, LLMs were inadvertently trained to seem correct, and a little message that says "Boom! Task complete! How else may I help?" subconsciously leads you to think it's correct.
stephendause commented on A definition of AGI   arxiv.org/abs/2510.18212... · Posted by u/pegasus
jal278 · 2 months ago
The fundamental premise of this paper seems flawed -- take a measure specifically designed for the nuances of how human performance on a benchmark correlates with intelligence in the real world, and then pretend as if it makes sense to judge a machine's intelligence on that same basis, when machines do best on these kinds of benchmarks in a way that falls apart when it comes to the messiness of the real world.

This paper, for example, uses the 'dual N-back test' as part of its evaluation. In humans this relates to variation in our ability to use working memory, which in humans relates to 'g'; but it seems pretty meaningless when applied to transformers -- because the task itself has nothing intrinsically to do with intelligence, and of course 'dual N-back' should be easy for transformers -- they should have complete recall over their large context window.

Human intelligence tests are designed to measure variation in human intelligence -- it's silly to take those same isolated benchmarks and pretend they mean the same thing when applied to machines. Obviously a machine doing well on an IQ test doesn't mean that it will be able to do what a high IQ person could do in the messy real world; it's a benchmark, and it's only a meaningful benchmark because in humans IQ measures are designed to correlate with long-term outcomes and abilities.

That is, in humans, performance on these isolated benchmarks is correlated with our ability to exist in the messy real-world, but for AI, that correlation doesn't exist -- because the tests weren't designed to measure 'intelligence' per se, but human intelligence in the context of human lives.

stephendause · 2 months ago
This is a good insight, but do you know of better ways to measure machines' abilities to solve problems in the "messy real world"?
stephendause commented on A definition of AGI   arxiv.org/abs/2510.18212... · Posted by u/pegasus
anon35 · 2 months ago
> there's not really a destination. There is only the process of improvement

Surely you can appreciate that if the next stop on the journey of technology can take over the process of improvement itself that would make it an awfully notable stop? Maybe not "destination", but maybe worth the "endless conversation"?

stephendause · 2 months ago
I think it's not only the potential for self-improvement of AGI that is revolutionary. Even having an AGI that one could clone for a reasonable cost and have it work nonstop with its clones on any number of economically-valuable problems would be very revolutionary.
stephendause commented on SWE-Bench Pro   github.com/scaleapi/SWE-b... · Posted by u/tosh
gpt5 · 3 months ago
Slightly tangent question - they said that they have protected the public test set with a strong copyleft license to prevent training private models on them.

Does it actually work? Isn’t AI training so far simply ignores all license and copyright restrictions completely?

stephendause · 3 months ago
This is a key question in my opinion. It's one of the things that make benchmarking the SWE capabilities of LLMs difficult. It's usually impossible to know whether the LLM has seen a problem before, and coming up with new, representative problem sets is time-consuming.
stephendause commented on US High school students' scores fall in reading and math   apnews.com/article/naep-r... · Posted by u/bikenaga
mrandish · 3 months ago
There's a longer trend but also a clear inflection point around the rise of mobile phones and social media. N=1 but we delayed getting a phone for our kid until a few months after she turned 13, which was a good choice because now we wish we'd gone longer. We can see how social media and app snacking clearly have negative effects on attention span, attitude, etc.

Also choosing to close schools during COVID was as catastrophic as many predicted. Our kid was in 7th grade during COVID and teachers each year report the effects are still being felt across many students. Of course, naturally great students recovered quickly and innately poor students remained poor but the biggest loss was in the large middle of B/C students.

stephendause · 3 months ago
Jonathan Haidt has a lot of good material on this. He is leading the charge in encouraging parents to delay giving their child a phone until high school and not allowing them to have social media accounts until age 16.

https://www.goodmorningamerica.com/family/story/author-sugge...

stephendause commented on Google debuts device-bound session credentials against session hijacking   feistyduck.com/newsletter... · Posted by u/speckx
prasadjoglekar · 4 months ago
The first sentence

> HTTP cookies were never intended for session management

Seems odd. IIRC that's exactly what they were meant for. State management for http which is stateless. Am I missing some history here?

stephendause · 4 months ago
I could be wrong, but I believe the author is referring to cookies being used for session authentication as opposed to general session management.
stephendause commented on Babies made using three people's DNA are born free of mitochondrial disease   bbc.com/news/articles/cn8... · Posted by u/1659447091
mrweasel · 5 months ago
> Or adopt?

Adopt who? There is almost no children available for adoption, only highly handicapped children who needs an auxiliary family.

Might be easier with a donor egg, but where are you going to get that? Egg donation is highly regulated and many would find it hard to get a donor. Of course this solution also requires a donor egg, so you'd already need to have that available.

stephendause · 5 months ago
> There is almost no children available for adoption

This is not true, at least in the United States. For one thing, there are many children in foster care who want to be adopted. It is also possible, though difficult and expensive, to adopt infants from mothers giving up their children for adoption as well. I am not saying it's an easy option or that everyone should do it, but it is an option.

stephendause commented on What Trump's Big Beautiful Bill means for Wi-Fi 6E and 7 users: It's not pretty   zdnet.com/home-and-office... · Posted by u/CrankyBear
stephendause · 5 months ago
I am interested whether anyone knows how precedented this sort of situation is. It seems like a certain portion of the spectrum was intended to be used for one thing, and now part of it might be sold for another purpose, causing the original users to have to adapt to the reversal. How often has this sort of thing happened?

u/stephendause

KarmaCake day530December 20, 2020
About
SW engineer, adoptive parent, twin parent, interested in CS and software engineering as well as politics, philosophy, etc.
View Original