Readit News logoReadit News
tehsauce commented on World Emulation via Neural Network   madebyoll.in/posts/world_... · Posted by u/treesciencebot
tehsauce · 8 months ago
I love this! Your results seem comparable to the counter strike or minecraft models from a bit ago with massively less compute and data. It's particularly cool that it uses real world data. I've been wanting to do something like this for a while, like capturing a large dataset while backpacking in the cascades :)

I didn't see it in an obvious place on your github, do you have any plans to open source the training code?

tehsauce commented on People are just as bad as my LLMs   wilsoniumite.com/2025/03/... · Posted by u/Wilsoniumite
tehsauce · 9 months ago
There has been some good research published on this topic of how RLHF, ie aligning to human preferences easily introduces mode collapse and bias into models. For example, with a prompt like: "Choose a random number", the base pretrained model can give relatively random answers, but after fine tuning to produce responses humans like, they become very biased towards responding with numbers like "7" or "42".
tehsauce commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
novia · 9 months ago
Please stream the gameplay to twitch so people can compare.
tehsauce · 9 months ago
We have a shared community map where you can watch hundreds of agents from multiple peoples training runs playing in real time!

https://pwhiddy.github.io/pokerl-map-viz/

tehsauce commented on Show HN: Beating Pokemon Red with RL and <10M Parameters   drubinstein.github.io/pok... · Posted by u/drubs
modeless · 9 months ago
Can't Pokemon be beaten by almost random play?
tehsauce · 9 months ago
It's impossible to beat with random actions or brute force, but you can get surprisingly far. It doesn't take too long to get halfway through route 1, but even with insane compute you'll never make it even to viridian forest.
tehsauce commented on Claude Plays Pokémon   twitch.tv/claudeplayspoke... · Posted by u/LightMachine
tehsauce · 10 months ago
Anyone interested in watching lots of reinforcement agents playing pokemon red at once, we have a website which streams hundreds of concurrent games from multiple people’s training runs to a shared map in real time!

https://pwhiddy.github.io/pokerl-map-viz/

(works best on desktop)

tehsauce commented on Ask HN: Resources for general purpose GPU development on Apple's M* chips?    · Posted by u/thinking_banana
grovesNL · a year ago
wgpu has its own Metal backend that most people use by default (not MoltenVK).

There is also a Vulkan backend if you want to run Vulkan through MoltenVK though.

tehsauce · a year ago
the metal backend does currently generate quite a lot of unnecessary command buffers, but in general performance seems solid.
tehsauce commented on Were RNNs all we needed?   arxiv.org/abs/2410.01201... · Posted by u/beefman
tehsauce · a year ago
I haven’t gone through the paper in detail yet but maybe someone can answer. If you remove the hidden state from an rnn as they say they’ve done, what’s left? An mlp predicting from a single token?
tehsauce commented on     · Posted by u/Hyper_Spire
tehsauce · a year ago
The water consumed to produce a single hamburger is over 2000 liters, and the power likely well over 100 watt-hours.

That means gpt can write >1000 emails using the resources of feeding a single person lunch. The resource efficiency of these machines already is really quite astonishing.

tehsauce commented on The GJK Algorithm: A weird and beautiful way to do a simple thing   computerwebsite.net/writi... · Posted by u/arithmoquine
tehsauce · 2 years ago
Awesome article! Something slightly misleading though - the first image shows the intersection of a non-convex shape, but it isn't revealed until much later that the algorithm only works for convex shapes, not the type shown in the first image.
tehsauce commented on Grokked Transformers Are Implicit Reasoners   arxiv.org/abs/2405.15071... · Posted by u/jasondavies
Scene_Cast2 · 2 years ago
I just learned about grokking; reminds me of double descent, and I looked up a 2022 paper called "Unifying grokking and double descent". I'm still unclear on what the difference is. My basic understanding of double descent was that the regularization loss made the model focus on regularization after fitting the train data.
tehsauce · 2 years ago
Grokking is a sudden huge jump in test accuracy with increasing training steps, well after training accuracy has fully converged. Double descent is test performance increasing, decreasing, and then finally rising again as model parameters are increased.

u/tehsauce

KarmaCake day1494December 10, 2016
About
https//github.com/PWhiddy
View Original