Economics is important. Best bang for the buck seems to be OpenAI ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context window with useless tokens like Claude does, API works every time. Gets me out of bad spots. Can get confused, but I've been able to muddle through with it.
1: https://openrouter.ai/anthropic/claude-opus-4.1
2: https://openrouter.ai/anthropic/claude-sonnet-4
3: https://block.github.io/goose/
4: https://openrouter.ai/anthropic/claude-3.5-sonnet
E.g. if need a self-contained script to do some data processing, for example, Opus can often do that in one shot. 500 line Python script would cost around $1, and as long as it's not tricky it just works - you don't need back-and-forth.
I don't think it's possible to employ any human to make 500 line Python script for $1 (unless it's a free intern or a student), let alone do it in one minute.
Of course, if you use LLM interactively, for many small tasks, Opus might be too expensive, and you probably want a faster model anyway. Really depends on how you use it.
(You can do quite a lot in file-at-once mode. E.g. Gemini 2.5 Flash could write 35 KB of code of a full ML experiment in Python - self-contained with data loading, model setup training, evaluation, all in one file, pretty much on the first try.)
The pronunciation sounds about right - i thought it's the hard part. And the model does it well. But voice timbre should be simpler to fix? Like, a simple FIR might improve it?
My understanding is that the former (sucking up) is a personality trait, substantially influenced by the desire to facilitate engagement. The latter (making up facts), I do not think is correct to ascribe to a personality trait (like compulsive liar); instead, it is because the fitness function of LLMs drive them to produce some answer and they do not know what they're talking about, but produce strings of text based on statistics.
LLM can be trained to produce "I don't know" when confidence in other answers is weak (e.g. weak or mixed signals). Persona vector can also nudge it into that direction.
Facebook shouldn't legally be allowed to demand an ID any more than this disaster of an "app."
Now tens of thousands of people will be subject to identity theft because someone thought this was a neat growth hacking pattern for their ethically dubious idea of a social networking site.
It can be done with fairly basic cryptography. But the infrastructure around it would grow only if there's a demand. Otherwise people go with lowest denominator.
The learning curve might be a bit difficult, but afterwards everything makes sense. And let's be honest, you just need a few actions (pull, add, reset, branch, commit) to use it in 95% of the cases.
A lot of people are religious about rebasing, "clean" commit history. But it's pretty much incompatible with several devs working on a single branch. I.e. when you work on something complex, perhaps under time pressure, git habits bite you in the ass. It's not fine.
Does that mean that the llms realized they could not solve it. I thought that was one of the limitations of LLMs in that they dont know what they dont know, and it is really impossible without a solver to know the consistency of an argument, ie, know that one knows.
You can do a lot of things on top: e.g. train a linear probe to give a confidence score. Yes, it won't be 100% reliable, but it might be reliable if you constraint it to a domain like math.
- He's a courseboi that sells a community that will make you 'Get from $0 to $100 Million in ARR'
- The stuff about 'it was during a code freeze' doesn't make sense. What does 'code freeze' even mean when you're working alone and vibe coding and asking the agent to do things
- Yes LLMs hallucinate. The guy seems smart and I guess he knows it. Yet he deliberately drives up the emotional side of everything saying that replit "fibbed" and "lied" because it created tests that didn't work.
- He had a lot of tweets saying that there was no rollback, because the LLM doesn't know about the rollback. Which is expected. He managed to rollback the database using Replit's rollback functionality[0], but still really milks the 'it deleted my production database'
- It looks like this was a thread about vibe coding daily. This was day 8. So this was an app in very early development and the 'production' database was probably the dev database?
Overall just looks like a lot of attention seeking to me.
[0] https://x.com/jasonlk/status/1946240562736365809 "It turns out Replit was wrong, and the rollback did work."