Readit News logoReadit News
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
novia · 6 months ago
Can you make a twitch stream of a single agent playing?
drubs · 6 months ago
Wouldn't make much sense. We generally train with 288 environments simultaneously. I've been thinking about ways to nicely stream all 288 environments though.
drubs commented on Reflection – AlphaGo / Gemini team building superintelligent coding agents   reflection.ai/superintell... · Posted by u/mlaskin
drubs · 6 months ago
Really excited to be a part of the team!
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
wegfawefgawefg · 6 months ago
you missed my point.

I know all about rl. Ive read go-explore 1/2, and I have personally implemented intrinsic curiosity.

I was just commenting on what rhe other person said, which is that it would be cool to have the npcs be agents that battle and train too, to which you said they could not be made to, to which I say, we have the technology. :)

drubs · 6 months ago
Sounds cool to me.
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
bubblyworld · 6 months ago
Thanks haha, I kept reading =D I see, so it's not just that you have to visit the key areas, they need to show up in the episodes enough to provide a signal for training.
drubs · 6 months ago
Yup!
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
throwaway314155 · 6 months ago
Awesome! Why do you think the reward for reading signs helped? I'm assuming the model doesn't gain the ability to read and understand english just from RL, so what purpose does it serve other than to maybe waste ticks on signs that ultimately don't need to be read?
drubs · 6 months ago
It's silly, but signs were a way to incentivize the agent to explore deeper into the Safari Zone among other areas.
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
wegfawefgawefg · 6 months ago
you dont port it you wrap it. you can put anything in an rl environment. usually emulators are done with bizhawk, and some lua. worst case theres ffi or screen capture.
drubs · 6 months ago
My first version of this project 5 years ago involved a python-lua named pipe using Bizhawk actually. No clue where that code went
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
kerkeslager · 6 months ago
Are there any uses for AI yet that aren't either:

1. Doing things humans do for fun. 2. Doing things that AI is horribly terrible at.

?

drubs · 6 months ago
There's a ton of applications for AI. Back when I was at Spotify, I co-authored Basic Pitch (https://basicpitch.spotify.com/), an audio-to-midi library. There are a ton of uses for AI outside of what's heavily publicized.
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
mclau156 · 6 months ago
Could you have used the decompilations of pokemon on github? https://github.com/pret/pokered
drubs · 6 months ago
There's an entire section on how the decompilations were used :)
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
levocardia · 6 months ago
Really cool work. It seems like some critical areas (team rocket, safari zone) rely on encoding game knowledge into the reward function somehow, which "smuggles in" external intelligence about the game. A lot of these are related to planning, which makes me wonder whether you could "bolt on" an LLM to do things like steer the RL agent, dynamically choose what to reward, or even do some of the planning itself. Do you think there's any low-hanging fruit on this front?
drubs · 6 months ago
Wrote about this in the results section. I think there is a way to mix the two and simplify the rewards in the process. A lot of the magic behind getting the agent to teach and use cut probably could have been handled by an LLM.
drubs commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
bubblyworld · 6 months ago
What an awesome project! I'm curious - I would have thought that rewarding unique coordinates would be enough to get the agent to (eventually) explore all areas, including the key ones. What did the agents end up doing before key areas got an extra reward?

(and how on earth did you port Pokémon red to a RL environment? O.o)

drubs · 6 months ago
The environments wouldn't concentrate enough in the Rocket Hideout beneath Celadon Game Corner. The agent would have the player wander the world reward hacking. With wild battles enabled, the environments would end up in Lavender Tower fighting Gastly.

> (and how on earth did you port Pokémon red to a RL environment? O.o)

Read and find out :)

u/drubs

KarmaCake day75March 2, 2025
About
Making models go brr

https://github.com/drubinstein https://x.com/dsrubinstein

View Original