You can't jump to the endpoint because you don't know where it is - all you can compute is 'from where I am, which direction should my next step be.' This is also why the results for few-step diffusion are so poor - if you take big jumps over the velocity field you're only going in approximately the right direction, so you won't end up at a properly stable point which corresponds to a "likely" image.
To produce plots out of data files, Python and R are probably the best solutions.
Aside my concern, Dario Amodei seems really into politics. I have read a couple of his blog posts and listened to a couple of podcast interviews here and there. Every time I felt like he sounded more like a politician than an entrepreneur.
I know Anthropic is particularly more mission-driven than, say OpenAI. And I respect that their constitutional ways of training and serving Claude models. Claude turned out to be a great success. But reading a manifest speaking of wars and their missions, it gives me chills.
Also, if it's essentially a sort of metadata, can't the output generated image be replicated (e.g. screenshot) and thus stripped of any such data?
Similarly, if there are millions of academic papers and thousands of peer reviews in the training data, a review of this exact paper doesn't need to be in there for the LLM to write something convincing. (I say "convincing" rather than "correct" since, the author himself admits that he doesn't agree with all the LLM's comments.)
I tend to recommend people learn these things from first principles (e.g. build a small neural network, explore deep learning, build a language model) to gain a better intuition. There's really no "magic" at work here.
One of the surprises of deep learning is that it can, sometimes, defy prior statistical learning theory to generalise, but this is still poorly understood. Concepts like grokking, double descent, and the implicit bias of gradient descent are driving a lot of new research into the underlying dynamics of deep learning. But I'd say it is pretty ahistoric to claim that this is obvious or trivial - decades of work studied "overfitting" and related problems where statistical models fail to generalise or even interpolate within the support of their training data.
Other labs like Google have them but they have simply trailed the Pareto frontier for the vast majority of use cases
Here's more detail on how price/performance stacks up
Cerebras requires a $3K/year membership to use APIs.
Groq's been dead for about 6 months, even pre-acquisition.
I hope Inception is going well, it's the only real democratic target at this. Gemini 2.5 Flash Lite was promising but it never really went anywhere, even by the standards of a Google preview
Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.
To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.
The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.
Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.