marcel-c13 (u/marcel-c13)

marcel-c13 commented on 'Attention is all you need' coauthor says he's 'sick' of transformers venturebeat.com/ai/sakana... · Posted by u/achow

TheRealPomax · 5 months ago

tl;dr: AI is built on top of science done by people just "doing research", and transformers took off so hard that those same people now can't do any meaningful, real AI research anymore because everyone only wants to pay for "how to make this one single thing that everyone else is also doing, better" instead of being willing to fund research into literally anything else.

It's like if someone invented the hamburger and every single food outlet decided to only serve hamburgers from that point on, only spending time and money on making the perfect hamburger, rather than spending time and effort on making great meals. Which sounds ludicrously far-fetched, but is exactly what happened here.

marcel-c13 · 5 months ago

Dude now I want a hamburger :(

marcel-c13 commented on 'Attention is all you need' coauthor says he's 'sick' of transformers venturebeat.com/ai/sakana... · Posted by u/achow

alyxya · 5 months ago

I think people care too much about trying to innovate a new model architecture. Models are meant to create a compressed representation of its training data. Even if you came up with a more efficient compression, the capabilities of the model wouldn't be any better. What is more relevant is finding more efficient ways of training, like the shift to reinforcement learning these days.

marcel-c13 · 5 months ago

But isn't the max training efficiency naturally tied to the architecture? Meaning other architecture have another training efficiency landscape? I've said it somewhere else: It is not about "caring too much about new model architecture" but to have a balance between exploitation and exploration.

marcel-c13 commented on 'Attention is all you need' coauthor says he's 'sick' of transformers venturebeat.com/ai/sakana... · Posted by u/achow

janalsncm · 5 months ago

I have a feeling there is more research being done on non-transformer based architectures now, not less. The tsunami of money pouring in to make the next chatbot powered CRM doesn’t care about that though, so it might seem to be less.

I would also just fundamentally disagree with the assertion that a new architecture will be the solution. We need better methods to extract more value from the data that already exists. Ilya Sutskever talked about this recently. You shouldn’t need the whole internet to get to a decent baseline. And that new method may or may not use a transformer, I don’t think that is the problem.

marcel-c13 · 5 months ago

I think you misunderstood the article a bit by saying that the assertion is "that a new architecture will be the solution". That's not the assertion. It's simply a statement about the lack of balance between exploration and exploitation. And the desire to rebalance it. What's wrong with that?

u/marcel-c13

KarmaCake day4October 24, 2025View Original