I like Apple, so I’m really hoping they bring on someone to solve this. Otherwise they’re on track to be the same as every other tasteless tech company.
More on taste and Apple: https://www.readtrung.com/p/steve-jobs-rick-rubin-and-taste
I like Apple, so I’m really hoping they bring on someone to solve this. Otherwise they’re on track to be the same as every other tasteless tech company.
More on taste and Apple: https://www.readtrung.com/p/steve-jobs-rick-rubin-and-taste
For example, I do not see the full system prompt anywhere, only an excerpt. But most importantly, they try to draw conclusions about the hallucinations in a weird vague way, but not once do they post an example of the notetaking/memory tool state, which obviously would be the only source of the spiralling other than the SP. And then they talk about the need of better tools etc. No, it's all about context. The whole experiment is fun, but terribly ran and analyzed. Of course they know this, but it's cooler to treat claudius or whatever as a cute human, to push the narrative of getting closer to AGI etc. Saying additional scaffolding is needed a bit is a massive understatement. Context is the whole game. That's like if a robotics company says "well, our experiment with a robot picking a tennis ball of the ground went very wrong and the ball is now radioactive, but with a bit of additional training and scaffolding, we expect it to compete in Wimbledon by mid 2026"
Similar to their "claude 4 opus blackmailing" post, they intentionally hid a bit the full system prompt, which had clear instructions to bypass any ethical guidelines etc and do whatever it can to win. Of course then the model, given the information immediately afterwards would try to blackmail. You literally told it so. The goal of this would to go to congress [1] and demand more regulations, specifically mentioning this blackmail "result". Same stuff that Sam is trying to pull, which would benefit the closed sourced leaders ofc and so on.
[1]https://old.reddit.com/r/singularity/comments/1ll3m7j/anthro...
The section on the identity crisis was particularly interesting.
Mainly, it left me with more questions. In particular, I would have been really interested to experiment with having a trusted human in the loop to provide feedback and monitor progress. Realistically, it seems like these systems would be grown that way.
I once read an article about a guy who had purchased a subway franchise, and one of the big conclusions was that running a subway franchise was _boring_. So, I could see someone being eager to delegate the boring tasks of daily business management to an AI at a simple business.
They have been aware for a few years that many clinicians aren’t documenting their work in the best way for billing. The current solution is to have an annual talk given by the one billing expert in their department pointing out where people often lose revenue due to poor documentation.
Not all the doctors attend this talk. There is no internal process for measuring subsequent improvements quantitatively. There are 85 doctors in her group.
Anyway, this is just to say that something automated to help doctors document their work in a billing friendly way seems powerful. But for my wife’s group, the issue doesn’t seem to be denied claims or “errors” per se. More omissions/sub optimal documentation due to lack of knowledge. Or lack of follow through on knowledge which is only occasionally communicated.
For a tool that radically increases productivity (say 2x), I think it could still make sense for a VC funded startup or an established company (even $100/day or $36k/year is still a lot less than hiring another developer). But for a side project or bootstrap effort, $36k/year obviously significantly increases cash expenses. $100/month does not, however.
So, I'm going to go back and upgrade to Max and try it again. If that keeps my costs to $100/month, thats a really different value proposition.
If executives / high level architects / researchers are working on this quarter's features something is very wrong. The higher you get the more ahead you need to be working, C-level departures should only have an impact about a year down the line, at a company of this size.
I keep wondering why. All projects I ever saw need lines of code, nuts and bolts removed instead of added. My best libraries consist of a couple of thousand lines.