tehsauce (u/tehsauce)

tehsauce commented on World Emulation via Neural Network madebyoll.in/posts/world_... · Posted by u/treesciencebot

tehsauce · 8 months ago

I love this! Your results seem comparable to the counter strike or minecraft models from a bit ago with massively less compute and data. It's particularly cool that it uses real world data. I've been wanting to do something like this for a while, like capturing a large dataset while backpacking in the cascades :)

I didn't see it in an obvious place on your github, do you have any plans to open source the training code?

tehsauce commented on People are just as bad as my LLMs wilsoniumite.com/2025/03/... · Posted by u/Wilsoniumite

tehsauce · 9 months ago

There has been some good research published on this topic of how RLHF, ie aligning to human preferences easily introduces mode collapse and bias into models. For example, with a prompt like: "Choose a random number", the base pretrained model can give relatively random answers, but after fine tuning to produce responses humans like, they become very biased towards responding with numbers like "7" or "42".

tehsauce commented on Show HN: Beating Pokemon Red with RL and <10M Parameters drubinstein.github.io/pok... · Posted by u/drubs

novia · 9 months ago

Please stream the gameplay to twitch so people can compare.

tehsauce · 9 months ago

We have a shared community map where you can watch hundreds of agents from multiple peoples training runs playing in real time!

https://pwhiddy.github.io/pokerl-map-viz/

tehsauce commented on Show HN: Beating Pokemon Red with RL and <10M Parameters drubinstein.github.io/pok... · Posted by u/drubs

modeless · 9 months ago

Can't Pokemon be beaten by almost random play?

tehsauce · 9 months ago

It's impossible to beat with random actions or brute force, but you can get surprisingly far. It doesn't take too long to get halfway through route 1, but even with insane compute you'll never make it even to viridian forest.

tehsauce commented on Claude Plays Pokémon twitch.tv/claudeplayspoke... · Posted by u/LightMachine

tehsauce · 10 months ago

Anyone interested in watching lots of reinforcement agents playing pokemon red at once, we have a website which streams hundreds of concurrent games from multiple people’s training runs to a shared map in real time!

https://pwhiddy.github.io/pokerl-map-viz/

(works best on desktop)

tehsauce commented on Ask HN: Resources for general purpose GPU development on Apple's M* chips? · Posted by u/thinking_banana

grovesNL · a year ago

wgpu has its own Metal backend that most people use by default (not MoltenVK).

There is also a Vulkan backend if you want to run Vulkan through MoltenVK though.

tehsauce · a year ago

the metal backend does currently generate quite a lot of unnecessary command buffers, but in general performance seems solid.

tehsauce commented on Were RNNs all we needed? arxiv.org/abs/2410.01201... · Posted by u/beefman

tehsauce · a year ago

I haven’t gone through the paper in detail yet but maybe someone can answer. If you remove the hidden state from an rnn as they say they’ve done, what’s left? An mlp predicting from a single token?

tehsauce commented on · Posted by u/Hyper_Spire

tehsauce · a year ago

The water consumed to produce a single hamburger is over 2000 liters, and the power likely well over 100 watt-hours.

That means gpt can write >1000 emails using the resources of feeding a single person lunch. The resource efficiency of these machines already is really quite astonishing.

tehsauce commented on The GJK Algorithm: A weird and beautiful way to do a simple thing computerwebsite.net/writi... · Posted by u/arithmoquine

tehsauce · 2 years ago

Awesome article! Something slightly misleading though - the first image shows the intersection of a non-convex shape, but it isn't revealed until much later that the algorithm only works for convex shapes, not the type shown in the first image.

tehsauce commented on Grokked Transformers Are Implicit Reasoners arxiv.org/abs/2405.15071... · Posted by u/jasondavies

Scene_Cast2 · 2 years ago

I just learned about grokking; reminds me of double descent, and I looked up a 2022 paper called "Unifying grokking and double descent". I'm still unclear on what the difference is. My basic understanding of double descent was that the regularization loss made the model focus on regularization after fitting the train data.

tehsauce · 2 years ago

Grokking is a sudden huge jump in test accuracy with increasing training steps, well after training accuracy has fully converged. Double descent is test performance increasing, decreasing, and then finally rising again as model parameters are increased.

u/tehsauce

KarmaCake day1494December 10, 2016

About

https//github.com/PWhiddy

View Original