Readit News logoReadit News
danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
YeGoblynQueenne · 5 months ago
Reinforcement learning is very good with games.

>> In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

And that is why it is never going to work in the real world: games have clear objectives with obvious rewards. The real world, not so much.

danijar · 5 months ago
For a lot of things, VLMs are good enough already to provide rewards. Give them the recent images and a text description of the task and ask whether the task was accomplished or not.

For a more general system, you can annotate videos with text descriptions of all the tasks that have been accomplished and when, then train a reward model on those to later RL against.

danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
jonathanyc · 5 months ago
They write: "Below, we show uncut videos of runs during which Dreamer collected diamonds."

... but the first video only shows the player character digging downwards without using any tools and eventually dying in lava. What?

danijar · 5 months ago
It gets diamonds at 1:48 in the top left video (might need to full screen to seek) [1].

The tools are admittedly really hard to see in the videos because of the timelapse and MP4 struggles a bit on the low resolution, but they are there :)

[1]: https://danijar.com/dreamerv3/

danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
ninetyninenine · 5 months ago
It's parroting human reinforcement.
danijar · 5 months ago
It actually has no human data as input and learns by itself in the environment, that's the point of the accomplishment! :)
danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
camel-cdr · 5 months ago
Ah, is this full RL?

I was reading something about LLMs earlier and was thinking that LLMs could probably write a simple case based script for controlling a player, that could accive a decent success rate.

danijar · 5 months ago
Yes, it's RL from scratch and sparse rewards
danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
lupusreal · 5 months ago
Characterizing finding diamonds as "mastering" Minecraft is extremely silly. Tantamount to saying "AI masters Chess: Captures a pawn." Getting diamonds is not even close to the hardest challenge in the game, but most readers of Nature probably don't have much experience playing Minecraft so the title is actually misleading, not harmless exaggeration.
danijar · 5 months ago
I agree with you, this is just the start and Minecraft has a lot more to offer for future research!
danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
zvitiate · 5 months ago
What if you were in an environment where you had to play Minecraft for say, an hour. Do you think your child brain would've eventually tried enough things (or had your finger slip and stay on the mouse a little extra while), noticed that hitting a block caused an animation, (maybe even connect it with the fact that your cursor highlights individual blocks with a black box,) decide to explore that further, and eventually mine a block? Your example doesn't speak to this situation at all.
danijar · 5 months ago
I think learning to hold a button down in itself isn't too hard for a human or robot that's been interacting with the physical world for a while and has learned all kinds of skills in that environment.

But for an algorithm learning from scratch in Minecraft, it's more like having to guess the cheat code for a helicopter in GTA, it's not something you'd stumble upon unless you have prior knowledge/experience.

Obviously, pretraining world models for common-sense knowledge is another important research frontier, but that's for another paper.

danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
SpaceManNabs · 5 months ago
I just want to express my condolences in how difficult it must be to correct basic misunderstandings that can be immediately corrected from reading the fourth paragraph under the section "Diamonds are forever"

Thanks for your hard work.

danijar · 5 months ago
Haha thanks!
danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
itchyjunk · 5 months ago
Since diamonds are surrounded by danger and if it dies, it loses its items and such, why would it not be satisfied after discovering iron pick axe or somesuch? Is it in a mode where it doesn't lose its item when it dies? Does it die a lot? Does it ever try digging vertically down? Does it ever discover other items/tools you didn't expect it to? Open world with sparse reward seems like such a hard problem. Also, once it gets the item, does it stop getting reward for it? I assume so. Surprised that it can work with this level of sparse rewards.
danijar · 5 months ago
When it dies it loses all items and the world resets to a new random seed. It learns to stay alive quite well but sometimes falls into lava or gets killed by monsters.

It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.

Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.

danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
reportgunner · 5 months ago
Article makes it seem like finding diamonds is some kind of super complicated logical puzzle. In reality the hardest part is knowing where to look for them and what tool you need to mine them without losing them once you find them. This was given to the AI by having it watch a video that explains it.

If you watch a guide on how to find diamonds it's really just a matter of getting an iron pickaxe, digging to the right depth and strip mining until you find some.

danijar · 5 months ago
Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.

It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2

danijar commented on DeepMind program finds diamonds in Minecraft without being taught   nature.com/articles/d4158... · Posted by u/Bender
Animats · 5 months ago
Key to Dreamer’s success, says Hafner, is that it builds a model of its surroundings and uses this ‘world model’ to ‘imagine’ future scenarios and guide decision-making.

Can you look at the world model, like you can look at Waymo's world model? Or is it hidden inside weights?

Machine learning with world models is very interesting, and the people doing it don't seem to say much about what the models look like. The Google manipulation work talks endlessly about the natural language user interface, but when they get to motion planning, they don't say much.

danijar · 5 months ago
Yes, you can decode the imagined scenarios into videos and look at them. It's quite helpful during development to see what the model gets right or wrong. See Fig. 3 in the paper: https://www.nature.com/articles/s41586-025-08744-2

u/danijar

KarmaCake day127April 26, 2015View Original