So... practically no one? My experience has been that almost everyone testing these cutting edge AI tools as they come out are more interested in new tool shinyness than safety or security.
So... practically no one? My experience has been that almost everyone testing these cutting edge AI tools as they come out are more interested in new tool shinyness than safety or security.
Will this still be an exit event for employees or do they get screwed here?
It feels pretty intuitive to me that the ability for an LLM to break a complex problem down into smaller, more easily solvable pieces will unlock the next level of complexity.
This pattern feels like a technique often taught to junior engineers- how to break up a multi-week project into bitesized tasks. This model is obviously math focused, but I see no reason why this wouldn't be incredibly powerful for code based problem solving.
1. Already in anti-trust related to ads, AI is probably in the clear.
2. If they are thought to violating a law they will get like a $10,000,000 fine and pay it, still less money than they will make from harvesting data.
"Already in trouble for committing monopolist behavior in market A, Google should be fine committing even more monopolist behavior in the very related and overlapping market of B"
This makes claim makes pretty little sense to me. AI search and Google web search (ads) are already stepping on each other. I see no reason that Google wouldn't be worried about antitrust on AI search if they're worried about antitrust action in general- which they clearly are.
An unsuccessful project might be unsuccessful because it got eaten by costs before it became successful.
A wildly successful project is risky to migrate.
> This is concerning because it suggests that, should an AI system find hacks, bugs, or shortcuts in a task, we wouldn’t be able to rely on their Chain-of-Thought to check whether they’re cheating or genuinely completing the task at hand.
As a non-expert in this field, I fail to see why a RL model taking advantage of it's reward is "concerning". My understanding is that the only difference between a good model and a reward-hacking model is if the end behavior aligns with human preference or not.
The articles TL:DR reads to me as "We trained the model to behave badly, and it then behaved badly". I don't know if i'm missing something, or if calling this concerning might be a little bit sensationalist.
Deleted Comment
I don't believe OP's thesis is properly backed by the rest of his tweet, which seems to boil down to "LLM's can't properly cite links".
If LLM's performing poorly on an arbitrary small-scoped test case makes you bearish on the whole field, I don't think that falls on the LLM's.
> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.
For Claude Code users this is huge - assuming coherence remains strong past 200k tok.