Readit News logoReadit News
scottmsul commented on Sycophancy in GPT-4o   openai.com/index/sycophan... · Posted by u/dsr12
scottmsul · 4 months ago
Or you could, you know, let people have access to the base model and engineer their own system prompts? Instead of us hoping you tweak the only allowed prompt to something everyone likes?

So much for "open" AI...

scottmsul commented on Show HN: Factorio Learning Environment – Agents Build Factories   jackhopkins.github.io/fac... · Posted by u/noddybear
scottmsul · 6 months ago
There was a HN post here not too long ago about a team that used reinforcement learning to train an agent to beat pokemon red. They mentioned how they had to tweak the cost function to give small rewards for exploring and big rewards for completing "essential tasks" like beating gyms.

I wonder if this same approach could be used here in factorio? Using the pokemon red analogy the main "essential tasks" in Factorio are setting up automation for new items and new science packs. I think a good reward function could involve small rewards functions for production rates of each item/sec, medium rewards for setting up automation for new items, and big rewards for automating each new science pack.

Telling a Factorio agent to just "make a big factory" is like telling a pokemon red agent to just "beat the game", it has to be broken down into smaller steps with a very carefully tuned reward function.

Thinking about this is really making me want to jump into this project!

scottmsul · 6 months ago
Also I should add, being a Factorio veteran with 2-3k hours in this game, I think the goal of making the "largest possible factory" is too vague and not the right metric. When Factorio players make large megabases, they don't go for "size" per se, but rather science research per minute. The metric you should be telling the agents is SPM, not "largest" base!
scottmsul commented on Show HN: Factorio Learning Environment – Agents Build Factories   jackhopkins.github.io/fac... · Posted by u/noddybear
scottmsul · 6 months ago
There was a HN post here not too long ago about a team that used reinforcement learning to train an agent to beat pokemon red. They mentioned how they had to tweak the cost function to give small rewards for exploring and big rewards for completing "essential tasks" like beating gyms.

I wonder if this same approach could be used here in factorio? Using the pokemon red analogy the main "essential tasks" in Factorio are setting up automation for new items and new science packs. I think a good reward function could involve small rewards functions for production rates of each item/sec, medium rewards for setting up automation for new items, and big rewards for automating each new science pack.

Telling a Factorio agent to just "make a big factory" is like telling a pokemon red agent to just "beat the game", it has to be broken down into smaller steps with a very carefully tuned reward function.

Thinking about this is really making me want to jump into this project!

scottmsul commented on Show HN: Play with real quantum physics in your browser   quantum.orgsoft.org... · Posted by u/mattvr
scottmsul · 7 months ago
I just see a bunch of spinning coins forever and nothing happens with no way to stop it...
scottmsul commented on The bunkbed conjecture is false   igorpak.wordpress.com/202... · Posted by u/surprisetalk
yread · a year ago
I don't know. To me it sounds like it's obvious the conjecture doesn't hold. if you have a path in the upper bunk that gets broken you are screwed but if that path is broken you have the option to switch to path in the lower bunk at ANY moment. So you have n+1/n higher chance of a break but n-1 ways how to avoid it
scottmsul · a year ago
> if you have a path in the upper bunk that gets broken you are screwed

Counter-factuals like this don't apply when talking about average probabilities. If you cross over, it's an identical graph with identical probabilities. idk, to me it seems really counter-intuitive that the opposite bunk's node would be easier to get to than the current bunk's node.

scottmsul commented on Project Hammer: reduce collusion in the Canadian grocery sector   jacobfilipp.com/hammer/... · Posted by u/surprisetalk
scottmsul · a year ago
Suppose all the groceries raised the price of item X by exactly the same percent Y on the same day. At first glance this would look like perfect collusion in the data. But how do you know it wasn't just the supplier price going up by Y percent?

Dead Comment

scottmsul commented on What do scientists tell us about boxing's gender row?   bbc.com/news/articles/crl... · Posted by u/YeGoblynQueenne
intellix · a year ago
How about people like Michael Phelps who were born with swimming advantages? Was reading an article years and years ago how his body is naturally made for swimming.
scottmsul · a year ago
Suppose we took this line of thinking to its natural conclusion. If we wanted to be completely neutral, in theory we could have no male/female division at all, and just compete for the "best human" in each sport. But because of the vast biological differences between men and women, men would win every single time, hands down. This goes back to why do we have men and women divisions in the first place? Because we want a space for people without the vast advantages from male physiology to compete fairly with each other. Allowing for people who are technically "women" based on their reproductive anatomy, but have all the male physiological advantages otherwise, feels like it defeats the entire purpose from having separate women divisions.
scottmsul commented on Is the frequency domain a real place?   lcamtuf.substack.com/p/is... · Posted by u/zdw
scottmsul · a year ago
Except sinusoids are special in that they are natural solutions to the Helmholtz wave equation. There's other problems too like square waves having infinite energy. This article might make sense to a mathematician or computer scientist but neglects the underlying physics of sound and waves.

u/scottmsul

KarmaCake day553July 20, 2015View Original