ARC Prize – a $1M+ competition towards open AGI progress

This is super cool. I share Francois' intuition that the presently data-hungry learning paradigm is not only not generalizable but unsustainable: humans do not need 10,000 examples to tell the difference between cats and dogs, and the main reason computers can today is because we have millions of examples. As a result, it may be hard to transfer knowledge to more esoteric domains where data is expensive, rare, and hard to synthesize.

If I can make one criticism/observation of the tests, it seems that most of them reason about perfect information in a game-theoretic sense. However, many if not most of the more challenging problems we encounter involve hidden information. Poker and negotiations are examples of problem solving in imperfect information scenarios. Smoothly navigating social situations also requires a related problem of working with hidden information.

One of the really interesting things we humans are able to do is to take the rules of a game and generate strategies. While we do have some algorithms which can "teach themselves" e.g. to play go or chess, those same self-play algorithms don't work on hidden information games. One of the really interesting capabilities of any generally-intelligent system would be synthesizing a general problem solver for those kinds of situations as well.

com2kid · a year ago

> humans do not need 10,000 examples to tell the difference between cats and dogs,

I swear, not enough people have kids.

Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

One thing kids do is they'll ask for confirmation of their guess. You'll be reading a book you've read 50 times before and the kid will stop you, point at a dog in the book, and ask "dog?"

And there is a development phase where this happens a lot.

Also kids can get mad if they are told an object doesn't match up to the expected label, e.g. my son gets really mad if someone calls something by the wrong color.

Another thing toddlers like to do is play silly labeling games, which is different than calling something the wrong name on accident, instead this is done on purpose for fun. e.g. you point to a fish and say "isn't that a lovely llama!" at which point the kid will fall down giggling at how silly you are being.

The human brain develops really slowly[1], and a sense of linear time encoding doesn't really exist for quite awhile. (Even at 3, everything is either yesterday, today, or tomorrow) so who the hell knows how things are being processed, but what we do know is that kids gather information through a bunch of senses, that are operating at an absurd data collection rate 12-14 hours a day, with another 10-12 hours of downtime to process the information.

[1] Watch a baby discover they have a right foot. Then a few days later figure out they also have a left foot. Watch kids who are learning to stand develop a sense of "up above me" after they bonk their heads a few time on a table bottom. Kids only learn "fast" in the sense that they have nothing else to do for years on end.

PheonixPharts · a year ago

> Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

I have kids so I'm presuming I'm allowed to have an opinion here.

This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Once they have the basics down concept acquisition time shrinks rapidly and kids can easily learn their new favorite animal in as little as a single example.

Compare this to LLMs which can one-shot certain tasks, but only if they have essentially already memorized enough information to know about that task. It gives the illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts.

Beyond just learning a new animal, humans are able to learn entirely new systems of reasoning in surprisingly few examples (though it does take quite a bit of time to process them). How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

9cb14c1ec0 · a year ago

> not enough people have kids.

Second that. I think I've learned as much as my children have.

> Watch a baby discover they have a right foot. Then a few days later figure out they also have a left foot.

Watching a baby's awareness grow from pretty much nothing to a fully developed ability to understand the world around is one of the most fascinating parts of being a parent.

smusamashah · a year ago

My kid is about 3 and has been slow on language development. He can barely speak a few short sentences now. Learning names of things and concepts made a big difference for him and that's a fascinating watch and realization.

This reminds of the story of Adam learning names, or how some languages can express a lot more in fewer words. And it makes sense that LLMs look intelligent to us.

My kid loves repeating the names of things he learned recently. For past few weeks, after learning 'spider' and 'snake' and 'dangerous' he keeps finding spiders around, no snakes so makes up snakes from curly drawn lines and tells us they are dangerous.

I think we learn fast because of stereo (3d) vision. I have no idea how these models learn and don't know if 3d vision will make multi model LLMs better and require exponentially less examples.

Nition · a year ago

> the kid will stop you, point at a dog in the book, and ask "dog?"

Of course for a human this can either mean "I have an idea about what a dog is, but I'm not sure whether this is one" or it can mean "Hey this is a... one of those, what's the word for it again?"

llm_trw · a year ago

Babies, unlike machine learning models, aren't placed in limbo when they aren't running back propagation.

Babies need few examples for complex tasks because they get constant infinitely complex examples on tasks which are used for transfer learning.

Current models take a nuclear reactors worth of power to run back prop on top of a small countries GDP worth of hardware.

They are _not_ going to generalize to AGI because we can't afford to run them.

1024core · a year ago

> I swear, not enough people have kids.

My friends toddler, who grew up with a cat in the house, would initially call all dogs "cat". :-D

resource0x · a year ago

I haven't seen 1000 cats in my entire life. I'm sure I learned how to tell a dog from a cat after being exposed to just a single instance of each.

cess11 · a year ago

I have a small kid. When they first saw some jackdaws, the first bird they noticed could fly, they thought it was terribly exciting and immediately learned the word for them, and generalised it to geese, crows, gulls and magpies (plus some less common species I don't know what they're called in english), pointing at them and screaming the equivalent of 'jackda! jackda!'.

Deleted Comment

PontifexMinimus · a year ago

> Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

If I was presented with 10 pictures of 2 species I'm unfamiliar with, about as different as cats and dogs, I expect I would be able to classify further images as either, reasonably accurately.

ein0p · a year ago

Not to mention that babies receive petabytes of visual input to go with other stimuli. It’s up for debate how sample efficient humans actually are in the first few years of their lives.

AuryGlenz · a year ago

That’s all true, yet my 2.5 year old sometimes one-shots specific information. I told my daughter that woodpeckers eat bugs out of trees after doing what you said and asking “what’s that noise?” for the fifth time in a few minutes when we heard some this spring. She brought it up again at least a week later, randomly. Developing brains are amazing.

She also saw an eagle this spring out the car window and said “an eagle! …no, it’s a bird,” so I guess she’s still working on those image classifications ;)

bamboozled · a year ago

I think your comment over intellectualises the way children experience the world.

My child experiences the world in a really pure way. They don’t care much about labels or colours or any other human inventions like that. He picks up his carrot, he doesn’t care about the name or the color . He just enjoys it through purely experiencing eating it. He can also find incredible flow state like joy from playing with river stones or looking at the moon.

I personally feel bad I have to each them to label things and but things in boxes. I think your child is frustrated at times because it’s a punish of a game. The departure from “the oceanic feeling.

Your comment would make sense to me if the end game of our brains and human experience is labelling things. It’s not. It’s useful but it’s not what living is about.

theptip · a year ago

> humans do not need 10,000 examples to tell the difference between cats and dogs

The optimization process that trained the human brain is called evolution, and it took a lot more than 10,000 examples to produce a system that can differentiate cats vs dogs.

Put differently, an LLM is pre-trained with very light priors, starting almost from scratch, whereas a human brain is pre-loaded with extremely strong priors.

PaulDavisThe1st · a year ago

> The optimization process that trained the human brain is called evolution, and it took a lot more than 10,000 examples to produce a system that can differentiate cats vs dogs.

Asserted without evidence. We have essentially no idea at what point living systems were capable of differentiating cats from dogs (we don't even know for sure which living systems can do this).

llm_trw · a year ago

>The optimization process that trained the human brain is called evolution

A human brain that doesn't get visual stimulus at the critical age between 0 and 3 years old will never be able to tell the difference between a cat and a dog because it will be forevermore blind.

Deleted Comment

pants2 · a year ago

Humans, I would bet, could distinguish between two animals they've never seen based only on a loose or tangential description. I.e. "A dog hunts animals by tracking and chasing them long enough to exhaust their energy, but a cat is opportunistic and strikes using stealth and agility."

A human that has never seen a dog or a cat could probably determine which is which based on looking at the two animals and their adaptations. This would be an interesting test for AIs, but I'm not quite sure how one would formulate a eval for this.

taneq · a year ago

Only after being exposed to (at least pictures and descriptions of) dozens if not hundreds of different types of animal and their different attributes. Literal decades of training time and carefully curated curriculum learning are required for a human to perform at what we consider ‘human level’.

ryankrage77 · a year ago

A possible way to this idea would be to draw two aliens with different hunting strategies and do a poll of which is which. I'd try it but my drawing skills are terrible and I'm averse to using generated images.

tigerlily · a year ago

Seems analogous to bouba/kiki effect:

https://en.m.wikipedia.org/wiki/Bouba/kiki_effect

jules · a year ago

Do computers need 10,000 examples to distinguish dogs from cats when pretrained on other tasks?

curious_cat_163 · a year ago

No.

VirusNewbie · a year ago

>: humans do not need 10,000 examples to tell the difference between cats and dogs

well, maybe. We view things in three dimensions at high fidelity: viewing a single dog or cat actually ends up being thousands of training samples, no?

amelius · a year ago

Yes, but we do not call a couch in a leopard print a leopard. Because we understand that the print is secondary to the function.

bbor · a year ago

Eh, still doesn’t hold up. I really don’t think there’s many psychologists working on the posited mechanism of simple NN-like backprop learning. Aka conditioning, I guess. As Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction. We definitely employ principles and patterns that are far more complex (more “emergent”?) than linear regression.

Tho I only ever did undergrad stats, maybe ML isn’t even technically a linear regression at this point. Still, hopefully my gist is clear

AIorNot · a year ago

There’s a great episode from Darkwish Patels podcast discussing this today

https://youtu.be/UakqL6Pj9xo?si=iDH6iSNyz1Net8j7

nphard85 · a year ago

Dwarkesh*

goertzen · a year ago

I don’t know enough of biology or genetics or evolution, but surely the millions of years of training that is hardcoded into our genes and expressed in our biology had much larger “training” runs.

allanrbo · a year ago

If a human eye works at say 10 fps, then 8 minutes with a cat is about 10k images :-D

captaincaveman · a year ago

I'd say that was more like a single instance, one interaction with a thing.

fennecbutt · a year ago

Humans don't need those examples because our brains are very pretrained. Natural fear of snakes and snakelike things, etc etc.

ML models are starting from absolute zero, single celled organism level.

Deleted Comment

woadwarrior01 · a year ago

> humans do not need 10,000 examples to tell the difference between cats and dogs

Neither do machines. Lookup few-shot learning with things like CLIP.

nextaccountic · a year ago

> humans do not need 10,000 examples to tell the difference between cats and dogs

Humans learn through a lifetime.

Or are we talking about newborn infants?

I really like the idea of ARC. But to me the problems seem like they require a lot of spatial world knowledge, more than they require abstract reasoning. Shapes overlapping each other, containing each other, slicing up and reassembling pieces, denoising regular geometric shapes, you can call them "core knowledge" but to me it seems like they are more like "things that are intuitive to human visual processing".

Would an intelligent but blind human be able to solve these problems?

I'm worried that we will need more than 800 examples to solve these problems, not because the abstract reasoning is so difficult, but because the problems require spatial knowledge that we intelligent humans learn with far more than 800 training examples.

modeless · a year ago

> to me it seems like they are more like "things that are intuitive to human visual processing".

Yann LeCun argues that humans are not general intelligence and that such a thing doesn't really exist. Intelligence can only be measured in specific domains. To the extent that this test represents a domain where humans greatly outperform AI, it's a useful test. We need more tests like that, because AIs are acing all of our regular tests despite being obviously less capable than humans in many domains.

> the problems require spatial knowledge that we intelligent humans learn with far more than 800 training examples.

Pretraining on unlimited amounts of data is fair game. Generalizing from readily available data to the test tasks is exactly what humans are doing.

> Would an intelligent but blind human be able to solve these problems?

I'm confident that they would, given a translation of the colors to tactile sensation. Blind humans still understand spatial relationships.

HarHarVeryFunny · a year ago

I just did the first 5 of the "public eval set" without having looked at the "public training set", and found them easy enough. If we're defining AGI as at least human level, then the AGI should also be able to do these without seeing any more examples.

I don't think there's any rules about what knowledge/experience you build into your solution.

mewpmewp2 · a year ago

AGI should obviously be able to do them. But AI being able to do those 100 percent wouldn't be evidence of AGI however. It is a very narrow domain.

nickpsecurity · a year ago

To parent: the spatial reasoning and blind person were great counterexamples. It still might be OK despite the blind exceptions if it showed general reasoning.

To OP: I like your project goal. I think you should look at prior, reasoning engines that tried to build common sense. Cyc and OpenMind are examples. You also might find use for the list of AGI goals in Section 2 of this paper:

https://arxiv.org/pdf/2308.04445

When studying intros of brain function, I also noted many regions tie into the hippocampus which might do both sense-neutral storage of concepts and make inner models (or approximations) of external world. The former helps tie concepts together through various senses. The latter helps in planning when we are imagining possibilities to evaluate and iterate on them.

Seems like AGI should have these hippocampus-like traits and those in the Cyc paper. One could test if an architecture could do such things in theory or on a small scale. It shouldn’t tie into just one type of sensory input either. At least two with the ability to act on what only exists in one or what is in both.

Edit: Children also have an enormous amount of unsupervised training on visual and spatial data. They get reinforcement through play and supervised training by parents. A realistic benchmark might similarly require GB of prettaining.

HarHarVeryFunny · a year ago

CYC was an expert system, which is arguably what LLMs are.

A similar vintage GOFAI project that might do better on these, with a suitable visual front end, is SOAR - a general purpose problem solver.

andoando · a year ago

I would argue that spatial reasoning encompasses all reasoning. All the things you mentioned have a direct analogue to abstract models and logic we employ and are engrained deeply into language. For example, shapes containing eachother:

There are two countries both which lay claim to the same territory. There is a set X that contains Y and there is a set Z that contains Y. In the case that the common overlap is 3D and one in on top of the other, we can extend this to there is a set X that contains -Y and a set Z that contains Y, and just as you can only see one on top and not both depending on where you stand, we can apply the same property here and say set X and Z cannot both exist, and therefore if set X is on then -Y and if set Z then Y.

If you pay attention to the language you use youll start to realize how much of it uses spatial relationships to describe completely abstract things. For example, one can speak of disintigrating hegonomic economies. i.e turning things built on top of eachother into nothing, to where it came

We are after all, reasoning about things which happen in time and space.

And spatial != visual. Even if you were blind youd have to reason spatially, because again any set of facts are facts in space-time. What does it take to understand history? People in space, living at various distances from each other, producing goods from various locations of the earth using physical processes, and physically exchanging them. To understand battles you have to understand how armies are arranged physically, how moving supplies works, weather conditions, how weapons and their physical forms affect what they can physically do, etc.

Hell LLMs, the largest advancement we had in artificial intelligence do what exactly? Encode tokens into multi dimensional space.

parentheses · a year ago

Spatial reasoning is easily isomorphic to many kinds of reasoning - just not all of them. Spatial reasoning in this case also limits the AI to 2 dimensions. I concede that with more dimensions, there will be more isomorphisms.

Is there a number of dimensions that captures all reasoning? I don't know..

CooCooCaCha · a year ago

“Would an intelligent but blind human be able to solve these problems?”

This is the wrong way to think about it IMO. Spatial relationships are just another type of logical relationship and we should expect AGI to be able to analyze relationships and generate algorithms on the fly to solve problems.

Just because humans can be biased in various ways doesn’t mean these biases are inherent to all intelligences.

crazygringo · a year ago

> Spatial relationships are just another type of logical relationship and we should expect AGI to be able to analyze relationships and generate algorithms on the fly to solve problems.

Not really. By that reasoning, 5-dimensional spatial reasoning is "just another type of logical relationship" and yet humans mostly can't do that at all.

It's clear that we have incredibly specialized capabilities for dealing with two- and three-dimensional spatiality that don't have much of anything to do with general logical intelligence at all.

janalsncm · a year ago

Part of the concern might be that visual reasoning problems are overrepresented in ARC in the space of all abstract reasoning problems.

It’s similar to how chess problems are technically reasoning problems but they are not representative of general reasoning.

dimask · a year ago

> Would an intelligent but blind human be able to solve these problems?

Blind people can have spatial reasoning just fine. Visual =/= spatial [0]. Now, one would have to adapt the colour-based tasks to something that would be more meaningful for a blind person, I guess.

[0] https://hal.science/hal-03373840/document

Lerc · a year ago

I don't think the intent is to learn the entire problem domain from the examples, but the specific rule that is being applied.

There may (almost certainly will be) additional knowledge encoded in the solver to cover the spacial concepts etc. The distinction with the AGI-ARC test is the disparity between human and AI performance, and that it focuses on puzzles that are easier for humans.

It would be interesting to see a finetuned LLM just try and express the rule for each puzzle as english. It could have full knowledge of what ARC-AGI is and how the tests operate, but the proof of the pudding is simply how it does on the test set.

lynx23 · a year ago

If a blind individual can solve a visually oriented challenge is not really a question of their intelligence but more a question of accessibility/translation. Just because I cant see something myself doesnt really say anything about my ability to deal with abstractions.