The surprising effectiveness of test-time training for abstract reasoning [pdf]

mikeknoop · a year ago

Context: ARC Prize 2024 just wrapped up yesterday. ARC Prize's goal is to be a north star towards AGI. The two major categories of this year's progress seem to fall into "program synthesis" and "test-time fine tuning". Both of these techniques are adopted by DeepMind's impressive AlphaProof system [1]. And I'm personally excited to finally see actual code implementation of these ideas [2]!

We still have a long way to go for the grand prize -- we'll be back next year. Also got some new stuff in the works for 2025.

Watch for the official ARC Prize 2024 paper coming Dec 6. We're going to be overviewing all the new AI reasoning code and approaches open sourced via the competition [3].

[1] https://deepmind.google/discover/blog/ai-solves-imo-problems...

[2] https://github.com/ekinakyurek/marc

[3] https://x.com/arcprize

aithrowawaycomm · a year ago

I am a bit uncertain about the rules of the ARC-AGI contest, but would this program count? A good chunk of the logic of ARC is essentially hardcoded, including a Python function that checks whether or not the proposed solution makes sense.

The point of the contest is to measure intelligence in general-purpose AI systems: it does not seem in the spirit of the contest that this AI would completely fail if the test was presented on a hexagonal grid.

0x1064 · a year ago

The point in the contest is to measure an algorithms ability to solve ARC problems specifically, no one believes that it's general-purpose AI. They're highly contrived problems by design.

razodactyl · a year ago

Majority of ARC can be gamed / hard-coded, no doubt about it.

The real pressure is the private hold-out set and the variations that can be added to counter this aspect.

A true AGI would be able to solve anything thrown at it which is where the authors are trying to lead AI engineering towards since LLMs have pretty much taken over.

If it starts getting too easy, they just reconsider and add harder problems.

It's like how we don't talk about the Turing Test anymore as it's no longer the best metric to determine real intelligence.

The authors are signalling to the industry that new ideas are needed and the monetary aspect is to show how serious they are about it.

It's good because as per above we have research being thrown at it which means we can iterate until we perhaps find another breakthrough.

benchmarkist · a year ago

The contest is misnamed, solving ARC will not get us any closer to AGI.

arjvik · a year ago

Test-Time Training is incredibly powerful. Most recently, it has been shown that Self-Attention can in fact be viewed through the lens of test-time training, with a kernel-smoother "learning" from context. Simply replacing that with more powerful models than a kernel-smoother result in very capable and scalable models!

https://arxiv.org/abs/2407.04620

sthlmb · a year ago

I initially read that as "Tea-Time" training and my inner Brit got a little excited..

antonvs · a year ago

We won't achieve true AGI until the AGIs are demanding second breakfast.

zbyforgotp · a year ago

Is test time the same thing as inference time?

whoisnnamdi · a year ago

yes