Personally, I also think the syntax is a little verbose: for a generic shape hint you need something like `Shaped[Array, "m n"]`. But 95% of the time I only really care about the shape "m n". It doesn't sound like much, but I recently tried hinting a codebase with jaxtyping and gave up because it was adding so much visual clutter, without clear benefits.
A follow-up question: Google's old `tensor_annotations` library (RIP) could statically analyse operations - eg. `reduce_sum(Tensor[Time, Batch], axis=0) -> Tensor[Batch]`. I guess that wouldn't come with static analysis for jaxtyping?
I am happy to ponder and willingly accept this is probably just my perception.
I have a couple of theories. The creators of the media are becoming more and more my age. Do they have nothing interesting to say to me as our experience is shared? Is this something experienced by previous generations as their generation took over media, or is our zeitgeist as "digital natives" so newly shared that this is a new experience?
I know people who would blame "ensh*tification" and move on, but I really think that there is more to what is happening.
What I do know is it's exceedingly rare for me to watch a movie or show made after about 2015 and to find myself thinking about it days later. There are of course exceptions.
Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.
To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.
The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.
Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.
You can't jump to the endpoint because you don't know where it is - all you can compute is 'from where I am, which direction should my next step be.' This is also why the results for few-step diffusion are so poor - if you take big jumps over the velocity field you're only going in approximately the right direction, so you won't end up at a properly stable point which corresponds to a "likely" image.