A modern self-referential weight matrix that learns to modify itself

heyitsguay · 3 years ago

I know Schmidhuber is famously miffed for missing out on the AI revolution limelight, and despite that he runs a pretty famous and well-resourced group. So with a paper like this demonstrating a new fundamental technique, you'd think they would eat the labor and compute costs of getting this up and running on a full gauntlet of high-profile benchmarks, in comparison with existing SOTA methods, vs the sort of half-hearted benchmarking that happens in this paper. It's a hassle, but all it would take for something like this to catch the community's attention would be a clear demonstration of viability in line with what groups at any of the other large research institutions do.

The failure to put something like that front and center makes me wonder how strong the method is, because you have to assume that someone on the team has tried more benchmarks. Still, the idea of learning a better update rule than gradient descent is intriguing, so maybe something cool will come from this :)

nullc · 3 years ago

Or they hurried the publication to avoid getting scooped and will follow up with interesting benchmarks later.

taneq · 3 years ago

If it’s really that new and different, maybe it’d be a little premature and even misleading to present the sort of full sweep you suggest. People are much better at pooh-poohing new ideas than at accurately assessing their potential.

P-NP · 3 years ago

"miffed for missing out on the AI revolution limelight?" Despite all those TV docs and newspaper articles about him? :-)

ricardobayes · 3 years ago

It's a super weird feeling to click on a hacker news top post and find out I know one of the authors. The world is a super small place.

watersb · 3 years ago

First, congratulations! It's a paper worth HN attention. Very cool.

Second: do Hacker News posts form a small-world network? I don't know. I don't even know if my question is well posed (it might be a meaningless question). Does the set of Hacker News articles change over time in ways that resemble annealing or self-training matrices? (likewise, I question this question, but I wonder.)

https://en.m.wikipedia.org/wiki/Small_world_network

goodmattg · 3 years ago

Need time to digest this paper, but you can assume if it's from Schmidhuber's group it will have some impact, even if only intellectual.

TekMol · 3 years ago

I have been playing with alternative ways to do machine learning on and off for a few years now. Some experiments went very well.

I am never sure if it is a waste of time or has some value.

If you guys had some unique ML technology that is different to what all the others do, what would you do with it?

drewm1980 · 3 years ago

Start with the assumption that someone has already done it... Do a thorough literature survey... Ask experts working on the most similar thing. Don't be disheartened if you weren't the first; ideas don't have to be original to have value; some ideas need reviving from time to time, or were ahead of their time when first discovered.

Szpadel · 3 years ago

ML is still fairly new topic and if you have some idea there is high chance that nobody actually tried it yet

swagasaurus-rex · 3 years ago

Create a demo of it doing -something-. Literally anything. Then show it off and see where it goes.

hwers · 3 years ago

Write a paper about it. Post it on arxiv.org. Contact some open minded researchers on twitter or here (show HN) for critique.

nitrogen · 3 years ago

You have to be affiliated with an institution to submit to arxiv.

jah242 · 3 years ago

Sounds like we are in very similar positions and have a very similar question :). My only real plan so far is to try and beat or match SOTA on a recent benchmark from a large corporate / research lab, give them an email and hope they are willing to talk to you.

daveguy · 3 years ago

Demo speaks louder than words. If you don't want to go into the details of how it works, it would still be interesting to just see where it over and under performs compared to existing systems.

mark_l_watson · 3 years ago

Absolutely! Also, if possible, a Colab (or plain Jupiter notebook) and data would be good.

nynx · 3 years ago

I’d make a blog and post about my experiments.

andai · 3 years ago

And a video too, please :)

javajosh · 3 years ago

Host it on a $5 VPS with full internet access and "see what happens".

ggerganov · 3 years ago

I would make a "Show HN" post

Deleted Comment

Eliezer · 3 years ago

Don't burn the capabilities commons. You probably don't have anything, in which case, why bother people? If you do have something, that advances AI capabilities and shortens the time before AGI; and while nobody actually has anything resembling a viable plan for surviving that, the fake plans tend to rely on having more time rather than less time.

voldacar · 3 years ago

A bit LARPy, don't you think?

ur-whale · 3 years ago

> what would you do with it?

Use the "proof is in the pudding" method:

Do something with it - preferably useful - that no one else can.

Scene_Cast2 · 3 years ago

If you do end up posting any sort of musings on this topic, I'd be really interested in taking a look.

mark_l_watson · 3 years ago

I haven't really absorbed this paper yet, but first thoughts were Hopfield Networks we used in the 1980s.

For unsupervised learning algorithms like masked models (BERT and some other Transformers), it makes sense to train in parallel with prediction. Why not?

My imagination can't wrap around using this for supervised (labeled data) learning.

codelord · 3 years ago

I haven't read the paper yet, no comment on the content. But it's amusing that more than 30% of references are self-citation.

lol1lol · 3 years ago

Hinton et al. self cite. Schmidhuber et al. self cite. One got Turing, the other got angry.

Deleted Comment

savant_penguin · 3 years ago

Just skimmed the paper but the benchmarks are super weird

jdeaton · 3 years ago

I'm having a hard time reading this paper without hearing you-again's voice in my head.